[OAI-implementers] OAI-PMH baseURL discovery

Andy Powell a.powell at ukoln.ac.uk
Sun Feb 13 16:03:32 EST 2005

I agree with your conclusion that it is sensible to adopt both approaches
#2 and #3.

I'm not sure what mechanisms are available for updating the REP though?
The REP pages are at


and the original REP author (I think) is Martijn Koster


The spec looks very... err... stable?!

I disagree with your implied preference for using <meta> rather than
<link>.  In this case, we clearly want to provide a link to another
resource - therefore the semantics of the <link> tag are much more
appropriate than the semantics of the <meta> tag.

I also suspect that your suggested use of




break the (X)HTML specs (though I haven't checked)?


On Sun, 13 Feb 2005, Michael Nelson wrote:

> (this is in response to Andy's mesg:
> http://www.openarchives.org/pipermail/oai-implementers/2005-February/001407.html)
> Drawing from our experience with mod_oai, we see at least 4 possible
> ways for robots to "automatically" discover OAI-PMH baseURLs:
> 1.  develop a separate file, oaimph.txt, similar in spirit to robots.txt
> 2.  add to the existing robots.txt file
> 3.  use HTML link or META tags for robots
> 4.  use the <friends> component in the Identify response.
> We do not prefer #1 - a separate file for robots to check seems unlikely
> to encourage widespread adoption.
> We prefer #2 because it injects OAI-PMH into the regular web
> mechanics where it belongs.  Robots already look for this file -
> why not put OAI-PMH statements where they expect to find guidance?
> Similarly, a robots.txt file is easy to install and edit (certainly
> easier than installing most repository software packages), so there
> will be no additional burden on a repository administrator.
> #3 can be used in some cases, but it makes an assumption that every
> repository we would like a robot to find has an HTML presence.  #2 and #3
> can be used separately since they address separate use cases.
> #4 is important and needs to be reinforced as a way of repositories
> "pointing" to each other.  You can't bootstrap baseURL discovery via
> <friends>, but once a robot knows about a single baseURL, it should be
> able to assemble a list of cooperating repositories.  No new functionality
> is needed for <friends>, but the robot scenario increases the importance
> of its use.
> robots.txt
> ----------
> The "problem" with robots.txt is that the syntax is very simple and is
> focused on telling robots what they can't do and not on what they should
> do.  So in addition to having a line such as:
> OAIPMHbaseURL: http://cs1.ist.psu.edu/cgi-bin/oai.cgi
> We would like to expand the syntax of the "Disalllow:" tag to include
> alternatives:
> Disallow: /citations/ OAIPMHbaseURL:
> http://cs1.ist.psu.edu/cgi-bin/oai.cgi
> Where the 2nd line is the alternate access for how to get at the
> information prohibited in the Disallow.  Depending on how robust
> robots are with respect to extended syntax, we could repeat the line
> in case the extended line is not understood:
> Disallow: /citations/
> Disallow: /citations/  OAIPMHbaseURL:
> http://cs1.ist.psu.edu/cgi-bin/oai.cgi
> HTML Tags for Robots
> --------------------
> It would be useful to tie an existing HTML page back to the original
> OAI-PMH repository from which it came, such as:
> http://uk.arxiv.org/abs/astro-ph/0502028
> having something like:
> <META NAME="ROBOTS" OAIPMHbaseURL="http://www.arxiv.org/oai2">
> It would also be useful to tie the HTML representation back to
> the structured metadata from which it came:
> OAIPMHrecord="http://www.arxiv.org/oai2?verb=GetRecord&metad
> ataPrefix=oai_dc&identifier=oai:arXiv.org:astro-ph/0502028">
> OAIPMHrecord="http://www.arxiv.org/oai2?verb=GetRecord&metad
> ataPrefix=oai_marc&identifier=oai:arXiv.org:astro-ph/0502028">
> This is similar to inverse of a DC.Identifier field -- instead of mapping
> from structured to un/semi-strucutred, it maps from un/semi-strucutred
> to structured.
> comments welcome,
> Michael Nelson & Herbert Van de Sompel
> ----
> Michael L. Nelson mln at cs.odu.edu http://www.cs.odu.edu/~mln/
> Dept of Computer Science, Old Dominion University, Norfolk VA 23529
> +1 757 683 6393 +1 757 683 4900 (f)
> _______________________________________________
> OAI-implementers mailing list
> List information, archives, preferences and to unsubscribe:
> http://www.openarchives.org/mailman/listinfo/oai-implementers

Distributed Systems, UKOLN, University of Bath, Bath, BA2 7AY, UK
http://www.ukoln.ac.uk/ukoln/staff/a.powell/      +44 1225 383933
Resource Discovery Network http://www.rdn.ac.uk/

More information about the OAI-implementers mailing list