[OAI-implementers] DP9- An OAI Gateway Service for Web Crawlers

Michael L. Nelson mln@ils.unc.edu
Wed, 21 Nov 2001 09:44:13 -0500 (EST)


>Doesn't the '?' in this URL mean that at least some search engines will
>not index this resource?  It might be better to configure things so that
>the persistent URL is of the form

maybe some won't, but I think that convention has been mostly ignored.

In http://naca.larc.nasa.gov/reports/ crawlers happily trudge through all
my reports, each of which has an index.cgi with arguments.  inktomi,
google and others happily invoke every possible combination presented to
them.

And robots.txt does not allow me to have a line like:

	Disallow: /reports/*/*/index.cgi

I could shut off all robots, but so far the load has not been too
bad.  But I'm considering using DP9 for robot access -- it might make
their crawls a little smarter.

regards,

Michael

---
Michael L. Nelson
NASA Langley Research Center		m.l.nelson@larc.nasa.gov
MS 158, Hampton, VA 23681		http://www.ils.unc.edu/~mln/
+1 757 864 8511				+1 757 864 8342 (f)