[OAI-implementers] OAI-PMH baseURL discovery

Mon Feb 14 12:37:06 EST 2005

> > I'm not sure what mechanisms are available for updating the REP though?
> > The REP pages are at
> >
>
> May not need to change, but just restarted, and driven by the community.
>
> Using the last rfc would not break anything, and would allow for the
> easy specification of the information
>   http://www.robotstxt.org/wc/norobots-rfc.html
>
> Add a standard harvester name, and use the proposed allow extension to
> specify the path.
>
>      User-agent: OAIPMHbaseURL
>       Allow:   /path_to_oai
>
> The rules should be written so that if your harvester has a user agent
> which is disallowed, then you should honor that request.

does anyone have any insight in to why this draft did not become an rfc?
I've poked around a variety of robots.txt files, and I don't see any use
of "Allow:" -- perhaps I've just missed them?

Either way, the Allow syntax as proposed does not "feel right".  Largely
due to the limited syntax of the existing robots.txt, the dependence on
the order of the lines is greatly increased.  Consider a site that has
content corresponding to 2 different baseURLs:

User-agent: *
Disallow: /a
Disallow: /b

It would be nice to bind these relative URIs to their original location,
something like:

OAI-PMH-baseURL: /a http://foo.org/oai
OAI-PMH-baseURL: /b http://bar.edu/perl/oai

those localizing the impact of the new syntax: 1 new line vs. a reserved
word for User-Agent & and a new line (which may have implication outside
of OAI-PMH).  if we wanted to bind the entire site to a single repo, we
could just:

OAI-PMH-baseURL: / http://xyz.edu/oai.cgi

regards,

Michael & Herbert

----
Michael L. Nelson mln at cs.odu.edu http://www.cs.odu.edu/~mln/
Dept of Computer Science, Old Dominion University, Norfolk VA 23529
+1 757 683 6393 +1 757 683 4900 (f)