[OAI-implementers] OAI-PMH baseURL discovery

Andy Powell a.powell at ukoln.ac.uk
Sun Feb 13 16:03:32 EST 2005


I agree with your conclusion that it is sensible to adopt both approaches
#2 and #3.

I'm not sure what mechanisms are available for updating the REP though?
The REP pages are at

http://www.robotstxt.org/wc/robots.html

and the original REP author (I think) is Martijn Koster

http://www.greenhills.co.uk/mak/mak.html

The spec looks very... err... stable?!

I disagree with your implied preference for using <meta> rather than
<link>.  In this case, we clearly want to provide a link to another
resource - therefore the semantics of the <link> tag are much more
appropriate than the semantics of the <meta> tag.

I also suspect that your suggested use of

OAIPMHbaseURL="..."

and

OAIPMHrecord="..."

break the (X)HTML specs (though I haven't checked)?

Andy.

On Sun, 13 Feb 2005, Michael Nelson wrote:

>
> (this is in response to Andy's mesg:
> http://www.openarchives.org/pipermail/oai-implementers/2005-February/001407.html)
>
> Drawing from our experience with mod_oai, we see at least 4 possible
> ways for robots to "automatically" discover OAI-PMH baseURLs:
>
> 1.  develop a separate file, oaimph.txt, similar in spirit to robots.txt
>
> 2.  add to the existing robots.txt file
>
> 3.  use HTML link or META tags for robots
>
> 4.  use the <friends> component in the Identify response.
>
> We do not prefer #1 - a separate file for robots to check seems unlikely
> to encourage widespread adoption.
>
> We prefer #2 because it injects OAI-PMH into the regular web
> mechanics where it belongs.  Robots already look for this file -
> why not put OAI-PMH statements where they expect to find guidance?
> Similarly, a robots.txt file is easy to install and edit (certainly
> easier than installing most repository software packages), so there
> will be no additional burden on a repository administrator.
>
> #3 can be used in some cases, but it makes an assumption that every
> repository we would like a robot to find has an HTML presence.  #2 and #3
> can be used separately since they address separate use cases.
>
> #4 is important and needs to be reinforced as a way of repositories
> "pointing" to each other.  You can't bootstrap baseURL discovery via
> <friends>, but once a robot knows about a single baseURL, it should be
> able to assemble a list of cooperating repositories.  No new functionality
> is needed for <friends>, but the robot scenario increases the importance
> of its use.
>
> robots.txt
> ----------
>
> The "problem" with robots.txt is that the syntax is very simple and is
> focused on telling robots what they can't do and not on what they should
> do.  So in addition to having a line such as:
>
> OAIPMHbaseURL: http://cs1.ist.psu.edu/cgi-bin/oai.cgi
>
> We would like to expand the syntax of the "Disalllow:" tag to include
> alternatives:
>
> Disallow: /citations/ OAIPMHbaseURL:
> http://cs1.ist.psu.edu/cgi-bin/oai.cgi
>
> Where the 2nd line is the alternate access for how to get at the
> information prohibited in the Disallow.  Depending on how robust
> robots are with respect to extended syntax, we could repeat the line
> in case the extended line is not understood:
>
> Disallow: /citations/
> Disallow: /citations/  OAIPMHbaseURL:
> http://cs1.ist.psu.edu/cgi-bin/oai.cgi
>
> HTML Tags for Robots
> --------------------
>
> It would be useful to tie an existing HTML page back to the original
> OAI-PMH repository from which it came, such as:
>
> http://uk.arxiv.org/abs/astro-ph/0502028
>
> having something like:
>
> <META NAME="ROBOTS" OAIPMHbaseURL="http://www.arxiv.org/oai2">
>
> It would also be useful to tie the HTML representation back to
> the structured metadata from which it came:
>
> <META NAME="ROBOTS"
> OAIPMHrecord="http://www.arxiv.org/oai2?verb=GetRecord&metad
> ataPrefix=oai_dc&identifier=oai:arXiv.org:astro-ph/0502028">
>
> <META NAME="ROBOTS"
> OAIPMHrecord="http://www.arxiv.org/oai2?verb=GetRecord&metad
> ataPrefix=oai_marc&identifier=oai:arXiv.org:astro-ph/0502028">
>
> This is similar to inverse of a DC.Identifier field -- instead of mapping
> from structured to un/semi-strucutred, it maps from un/semi-strucutred
> to structured.
>
> comments welcome,
>
> Michael Nelson & Herbert Van de Sompel
>
> ----
> Michael L. Nelson mln at cs.odu.edu http://www.cs.odu.edu/~mln/
> Dept of Computer Science, Old Dominion University, Norfolk VA 23529
> +1 757 683 6393 +1 757 683 4900 (f)
>
> _______________________________________________
> OAI-implementers mailing list
> List information, archives, preferences and to unsubscribe:
> http://www.openarchives.org/mailman/listinfo/oai-implementers
>
>

Andy
--
Distributed Systems, UKOLN, University of Bath, Bath, BA2 7AY, UK
http://www.ukoln.ac.uk/ukoln/staff/a.powell/      +44 1225 383933
Resource Discovery Network http://www.rdn.ac.uk/



More information about the OAI-implementers mailing list