the spider
The articles in the database are pulled in by a spider that surfs the Internet looking for relevant information.
It is an OO asynchronous Internet Spider written in C++. It is platform agnostic running on Vista, XP or UNIX.
It can use a PostGRES database or send documents to an Internet URI. You will need to install PostGRES on your box.
You can browse / download the source code for this here.