NewzBot is a collection
programs
I wrote to hunt around the Internet for hosts that allow
public access to their USENET servers. It's been on the web since
roughly November 1995, and has gone through a few changes over the
past few years, but it's core functionality and goal hasn't really changed.
The contents of USENET are distributed among many different
USENET servers. When two servers talk to each other, they use
a protocol called "NNTP" -- Network News Transfer Protocol -- documented by
RFC977
to
exchange articles: "oh, you have this message in misc.forsale.cats?
OK, send that to me. I have these articles in alt.fan.bill-gates.die.die.die".
When you want to read USENET news, you use a program called a newsreader
(most web browsers have integrated newsreaders)
which speaks NNTP
to a USENET server: "Show me the last 50 messages in rec.aquaria";
"Show me message 94943 in alt.sex.nfs".
The majority USENET servers are configured to only allow connections
from local users -- and most USENET servers carry only a subset
of USENET groups (one server might have the entire
comp.os.* hierarchy,
another might have all of
misc.* except
misc.jobs.offered, etc.).
That's where newzbot comes in. If your friend tells you about
this great new group called
comp.unix.bsd.openbsd, but your
local news server doesn't have this group, you can use newzbot
to find a public server that does.
Or, if you don't have access to a news server at all (say you get
service from a small ISP that can't afford a USENET news server)
you can use this page to find a news server.
Can I reprint or spider your list?
Technically, there's nothing stopping you from doing so
but if you do, I ask that you give proper credit -- namely, mention
the URL of my page (http://www.newzbot.com/).
If your publication exists on the WWW, I strongly suggest you
link to the "real" newzbot page rather than copying my list directly. These
pages are updated automatically and any copy you publish will
be out of date within a week.
If you want to spider the site, please make sure your application
obeys robots.txt! I've had a few people decide that it would
be krad to rapidly mirror the entire site, including the search
engine & web-based newsreader in /cgi-bin. I'm not sure if they realize
they're mirroring an entire USENET spool; in any case, when I
see people do this, I ban them, since it wastes my bandwidth.
The server I found on your page doesn't work anymore!
While the majority of the news servers on the list are intentionally
public, there are servers that are "accidentially" public, usually
because the systems administrator screwed up during the server
configuration process.
If a server is still listed as "good" but you suspect the server
has closed down or is passworded, you can force newzbot to re-test
the server by using the search page to
locate the server; if the server hasn't been tested in the past 24
hours, you will have the option to have it re-tested. This may
take a while, so you may have to come back in a day or two. (The
scan status page will show
you which (if any) servers are currently being scanned.)
If, after the server has been re-tested (you'll know because the
host information page will say something like "This news server
was last succesfully probed on [some recent date]") it still doesn't
work, please drop a note to webmaster @ newzbot.com & include the
hostname or IP address of the server and I will figure out what's
going on.
If the server claims to have a group but doesn't seem to have
any articles in this group, this is not a bug in newzbot.
Some servers are "post-only" and have zero articles. You can identify and
avoid these servers by looking at the
average article retention field.
I can't get any of these hosts to work! What's wrong?
Several things could explain this. You may be behind a firewall which
restricts outbound NNTP to only a select number of hosts. You
may have your newsreader configured incorrectly. Your connection
to the Internet may have just exploded. It's also possible that
every site on my list has expired. The latter is highly unlikely, but
if you think this is the case, wait a week and then check my page
again.
I don't have the resources to walk people through the configuration
of their newsreaders, or determine if they're behind a firewall.
Ask someone who knows your particular setup -- your local network
administrator, for instance.
Do you know which server carries the group alt.foo.bar?
No, but I wrote a search engine to help you find it!
What do the various fields in the server listings mean?
News server hostname - this is the hostname or IP
address of the news server. If you click the hostname, you'll
get some additional information about this news server.
Posting - this is mostly obvious except when it isn't :-)
This field shows whether or not the server claims to accept postings.
Note that just because a server accepts a posting doesn't mean it will
actually get propagated to the global USENET. You may have to experiment
a bit to find a server that has good propagation.
Newzbot is able to test the propagation of servers by posting
test messages to designated test groups (currently
alt.test, misc.test, and microsoft.test.) If the post is successful and
propagates, then this field will say "Verified", and the host
information page will have the time it took for the post to propagate
back "home" to my internal (private) news server. If the post was
accepted but has not propagated, then the field will say "Unverified."
Newzbot does not thoroughly test post-ability; the only real way to
test if posting works is to post a message to a newsgroup that the
server has and see if it is accepted. A status of "Unknown" indicates
that newzbot could not find a test group to post to. Most servers
unfortunately fall into this category; you'll have to experiment.
Groups - the number of groups that the server
claims to support. This is determined by sending a NNTP LIST command
to the server and counting the number of groups that come back.
Simple, eh? Unfortunately, some servers support lots of
groups but don't really have any articles in those groups.
If you click on the group count, and you
have a web browser that is capable of reading news (recent versions
of Netscape and Internet Explorer can), then you'll be able to read
news at this server. Otherwise, you'll have to put the server name
into your newsreader configuration where it asks for a 'NNTP server'.
Speed - this is a relative speed measurement of about how
fast the server's connection is, based on my connection (I have a 512kbit DSL
connection.) It's measured by timing how long
it takes to get a LIST of newsgroups from the server. Obviously, the
larger the LIST is, the more accurate the reading-- you'll notice
that servers with few groups get a poor speed rating; it's difficult
to estimate the speed of servers that only carry 1 or 2 groups.
In situations like this, I typically attempt to measure the speed
by pulling down articles, but if the server doesn't have many articles,
again, it's hard to guess the speed. So treat this as useful but very
relative.
Articles
- this field only shows up in the search results --
it's the number of reported articles (via the GROUP command)
in the newsgroup.
Clicking on the article count will let you read this group,
assuming your browser has a newsreader.
Next to the article count is the average article size
of articles in this group. This is probably most useful for determining
if a news server carries binary group content (some news servers
carry binary groups, but drop messages over a certain size).
This information is not guaranteed to be accurate, since the
server sometimes does not know how many articles are in
a group, but it is about as accurate as newzbot can get
without downloading every article from every group.
Days (search engine only, experimental) - a count of the
days of news a news server carries for a particular group.
This is determined by collecting XOVER data per-group per-server.
This may be inaccurate, as XOVER information can drift out of
sync with the actual article contents. In addition, since NNTP
clients can control the date on their postings, this information
can be completely bogus.
A value of 'n/a' means this information has not been collected.
Last post (search engine only, experimental) - how
long ago the news server received a post in the specified group.
This is relative to the last time the server was last scanned.
If a server was last scanned 10 days ago, and the last post was
11 days ago, then this means that 10 days ago, the last post was
1 day old. Servers with up-to-date (or identical) newsfeeds should
all have about the same "last post" information. If this is
a value in the near future (a negative value), then it means
the last post is in a timezone ahead of me (US/Pacific). If this
is a value in the distant past or distant future, then it means
that someone is putting bogus dates on their postings.
A value of 'n/a' means this information has not been collected.
Like "Days", this depends on XOVER information and may drift out of
sync with the actual article contents.
Average article retention - as noted above, some servers
will support a huge number of news groups, but they won't actually
contain many articles. Average article retention attempts to
determine how many average messages there are on the server.
There
are two ways to do this; (a) go into a news group and attempt to
retrieve every article and count the ones that are available, or
(b) extract the high/low article numbers from the LIST command
and hope the server isn't lying to us. (a) is very bandwidth-intensive
and would take a long time, but is 100% accurate. (b) is faster
and less accurate, since a server can claim to have articles 1-128
in a group, but when you go to retrieve them, you find that only
the last 5 of those articles are actually available (the others
having been expired or otherwise purged.) I use the LIST method
now, and hopefully it will be accurate enough, since I don't
look forward to enumerating every article from a server. (let
me know if you have any suggestions on a fast & accurate way to do
this.)
Added - when newzbot first discovered the server.
Useful for seeing just how long servers have been public. All times
are in the US/Pacific timezone (PDT/PST).
Last checked - as you might have already guessed, this indicates
when the news server was last tested.
I have a news server that I want newzbot to index.
Easy enough; you can
suggest a new site for newzbot to index.
newzbot will, however, refuse to index sites that have been intentionally
or automatically excluded from scans.
How does this program work?
There are 5 primary components:
a database (MySQL),
where all the information used by the scanner is stored;
a host gathering process ('host-gather'), which scans Usenet articles
for NNTP servers and enters them in a scanning queue;
the scanner itself ('nntp-scan'), a ~1700 line perl script that
interrogates a host for NNTP services and stores the results into a
database;
a parallelizing wrapper program ('dnntp-wrapper') that pulls hosts out
of the scanning queue and executes nntp-scan, while ensuring that only
one process works on a host at a time; and
The idea is that as much is automated as possible, so I don't have
to manually sift through hosts to scan, and then hand-publish/index
the results. Automation r0x.
Every time newzbot does a scan, it collects not only the hosts that
are running public NNTP, but also those that say "I'm not public."
It then take this list and automatically filters
those hosts so it doesn't scan them again. What this means
is if newzbot scans your site and you tell it to go away, it won't
come back to visit you for a long, long time (years..)
Can I get the source code to this way-cool program?
An ancient version of nntp-scan is available here.
If you have questions about this code, go grab the "Camel" book,
_Programming Perl_ from
O'Reilly & Associates.
You even can roll your own if you go grab
netcat
(courtesy of Avian Research),
and read the
Network News Transfer Protocol RFC.
If you're truly interested in how things work-- USENET in particular and
the Internet in general -- I'd really reccomend rolling your own.
You can do alot with netcat, shell scripts, and a RFC!
The source for the entire site isn't available (though if I ever
get everything nicely cleaned up, I'll make it available.)
Can I buy banner ad space on your site?
If you're interested in advertising on newzbot.com, please
contact webmaster @ newzbot.com.
Why did you/do you do this?
Well, back in 1993 I was reading on
alt.hackers
about someone writing
a NNTP scanning program and I thought to myself "Wow, that's kinda cool."
Later, after I learned perl and
started goofing off on the Web, I decided
that it would be sort of a nice service if I did occasional scans
and made the output available to the world.
After a while, though, as the site grew in users, I decided
to make some changes so the page was more usable. I added
a search engine and tidied up some of the pages, and then
largely left it alone. However, people kept asking for more
features, and so slowly, when I've had time to, I've added them.
Note that I don't work on newzbot fulltime; indeed, weeks or months
may pass before I do any visible work on the site. I've gotta eat, you know :)
If you're curious to see what I've done to the site recently
(or the systems that support it), check out the changelog.
How often do you update the lists?
Currently, I look for new servers twice per month.
Everything is timestamped, so you can get a general idea of how long
it's been since a server was tested.
I don't want you to scan or post my server.
If your news server is on the list and you want it removed, first
configure your news server software to either:
exclude everyone, and only allow connections from hosts or networks
you wish to have access. (this is the correct way to set up an access
control list.)
allow everyone, and exclude connections from 'jammed.com'/216.99.218.161.
This is kind of silly, since other people who scan the net for public
news servers will find you. I really suggest the first method.
Once you've done that, visit the
search page,
search for your news server,
and click to have it re-tested. Newzbot will connect, see that you've
restricted access, and remove you from the database, and, if your news
server sends newzbot a 500 message, never contact you again.
If you would like your server excluded from future scans,
click here and
follow the instructions.
If you have trouble with this process, please don't hesitate to
contact webmaster @ newzbot.com & I'll remove you by hand.
I'm having a hard time anthropomorphizing newzbot. Can you help?
One night after dinner I was doodling on an envelope..
[ sounds of black pen scratching ]
mlb:
that's neat.. is that going to be your newzbot.com logo?
well, anyway, you get the idea. Here are some of the sketches,
there may be more later if I can convince her to draw them :)
If you want to use these images in connection w/ the site, be
my guest -- but please don't link directly to them; serve them off your
own server.