Report by Michael Mauldin (Lycos)
(later edited by Michael Schwartz)
While the overall workshop goal was to determine areas where standards could be pursued, the Spidering BOF attempted to reach actual standards agreements about some immediate term issues facing robot-based search services, at least among spider-based search service representatives who were in attendance at the workshop (Excite, InfoSeek, and Lycos). The agreements fell into four areas, but we report only three of them here because the fourth area concerned a KEYWORDS tag that many workshop participants felt was not appropriate for specification by this BOF without the participation of other groups that have been working on that issue.
The remaining three areas were:
<META NAME="ROBOTS" CONTENT="ALL | NONE | NOINDEX | NOFOLLOW"> default = empty = "ALL" "NONE" = "NOINDEX, NOFOLLOW"The filler is a comma separated list of terms: ALL, NONE, INDEX, NOINDEX, FOLLOW, NOFOLLOW.
Discussion: This tag is meant to provide users who cannot control the robots.txt file at their sites. It provides a last chance to keep their content out of search services. It was decided not to add syntax to allow robot specific permissions within the meta-tag.
INDEX means that robots are welcome to include this page in search services.
FOLLOW means that robots are welcome to follow links from this page to find other pages.
So a value of "NOINDEX" allows the subsidiary links to be explored, even though the page is not indexed. A value of "NOFOLLOW" allows the page to be indexed, but no links from the page are explored (this may be useful if the page is a free entry point into pay-per-view content, for example. A value of "NONE" tells the robot to ignore the page.
<META NAME="DESCRIPTION" CONTENT="...text...">The intent is that the text can be used by a search service when printing a summary of the document. The text should not contain any formatting information.