Spider
Search Engine Spider
A spider is a program that browses or crawls web sites extracting information for search engines to index in their databases.
Spiders follow hyperlinks, and therefore to rank well in search engines is is important to develop backward links to your site from the major online directories and other web sites. Search Engine Spiders will then follow this links and spider your website. It is also important to create a good internal linking structure for your website, to enable the spider to crawl all the pages of your site.
Spiders are also called crawlers or bots.
RDF Spider
Using RDF, cross site searching mechanisms can be employed to search other websites. Box UK's Content Management System, Amaxus, automatically produces RDF files that describe the resources and metadata available from each server.
A central RDF ‘harvester’, or spider, periodically (e.g. every night) downloads the RDF files from each site and cache the results in a central database. This database can then be used to search the content from all sites.
The use of RDF files allows future sites to be included, independent of the CMS that is used (providing that each site can produce an RDF file, a W3C standard).
