[OAI-general] OAI identifiers / namespace

Pete Johnston Pete.Johnston at eduserv.org.uk
Tue Oct 31 13:22:12 EST 2006


Tom Habing wrote:

> Jakob Voss wrote:
> > Hi!
> > 
> > Is there is service provider that can be used as a link 
> resolver for 
> > OIA identifiers? OAI-IDs could be very useful to identify 
> resources in 
> > the same way as DOI (or even better because you can get all the 
> > metadata in machine readable format and OAI is free) but up 
> to now I 
> > have not found a way to get the document and/or metadata of a given 
> > OAI-ID. The problem is to find out the base url of the data 
> provider 
> > if you only know the OAI-ID.
> 
> Hi Jakob,
> 
> I've had a resolver service of sorts as part of the OAI 
> Registry at UIUC for a while.  For example,
> 
> http://gita.grainger.uiuc.edu/registry/rx.asp?oai:HUBerlin.de:1253
> 
> will redirect to the actual record as oai_dc:
> 
> http://edoc.hu-berlin.de/OAI-2.0?verb=GetRecord&identifier=oai
%3AHUBerlin%2Ede%3A1253&metadataPrefix=oai_dc

[snip]

> You might want to also look at Jeff Young's ERROL service for 
> a more sophisticated resolver service that uses some of the 
> features of the UIUC registry:
> 
> http://www.oclc.org/research/projects/oairesolver/default.htm

It does strike me that this sort of functionality would be much easier
to provide if data providers adopted the practice of assigning URIs
using the http: URI scheme as identifiers of OAI-PMH items (note: I'm
talking about identifiers of OAI-PMH _items_, not identifiers of the
resources described by metadata records exposed by/extracted from that
item - though the http: URI scheme also works perfectly well as an
identifier scheme for those described resources too!). 

The OAI-PMH spec licenses the use of any URI scheme, including the http:
URI scheme, for item identifiers:

===
The format of the unique identifier must correspond to that of the URI
(Uniform Resource Identifier) syntax. Individual communities may develop
community-specific URI schemes for coordinated use across repositories.
The scheme component of the unique identifiers must not correspond to
that of a recognized URI scheme unless the identifiers conform to that
scheme.
===

http://www.openarchives.org/OAI/openarchivesprotocol.html#UniqueIdentifi
er

The OAI-PMH spec goes on to offer the "OAI Identifier" as an alternative
to a registered URI scheme:

===
Repositories may implement the oai-identifier syntax described in the
accompanying Implementation Guidelines document.
===

The description of the "OAI Identifier" notes that an OAI Identifier has
the form

===
  oai-identifier = scheme ":" namespace-identifier ":" local-identifier

  scheme = "oai"

  namespace-identifier = domainname-word "." domainname
  domainname = domainname-word [ "." domainname ]
  domainname-word = alpha *( alphanum | "-" )

  local-identifier = 1*uric
===

And also:

===
Organizations must choose namespace-identifier values which correspond
to a domain-name that they have registered, and are committed to
maintaining. Note that since the oai-identifier is case-sensitive, a
particular capitalization style must be selected and used consistently.
A single domain name should not be used with variant capitalizations.

Domain name registration is used to avoid the need for any additional
registration service for oai-identifiers. Domain name based identifiers
guarantee global uniqueness without the need for OAI registration as
required with the earlier, v1.0/1.1 specification.
===

http://www.openarchives.org/OAI/2.0/guidelines-oai-identifier.htm

i.e. the assignment of "OAI Identifiers" uses the same mechanism for
distributing the ownership of sets of identifiers as does the http: URI
scheme i.e. they both rely on the registration of domain-names. 

And in both cases the persistence of an identifier (i.e. the notion that
the identifier continues to identify the same resource - the same
OAI-PMH item - over time) is dependent on

(a) the continued ownership of the domain-name by its current owner: 
If Cornell forgets to renew its subscription for the arxiv.org
domain-name, I can buy it and start assigning identifiers using that
domain-name, and this week I can assign the same identifier that Cornell
assigned last week to identify an item that disseminates metadata about
a document on the topic of quantum slow motion to identify an item that
disseminates metadata about a document on the topic of Sunderland
Football Club, and I can start telling the world that that identifier
identifies that second item. The result is that a single identifier has
been assigned to two different resources over a period of time. That
argument applies in the same way to both an OAI Identifier like
oai:arXiv.org:quant-ph/9901001 and to an http: URI like
http://arxiv.org/quant-ph/9901001

(b) the sensible management of identifiers within/under that domain:
If last week I assigned an identifier to an item that disseminates
metadata about a document on the topic of quantum slow motion and told
the world that that identifier identified that resource, then this week
I assign that same identifier to an item that disseminates metadata
about a document on the topic of Sunderland Football Club and tell the
world that that identifier identifies that item, then again the same
identifier has been assigned to two different resources over a period of
time, in this case, by a single agency.  And again, that argument
applies in the same way to both an OAI Identifier like
oai:my.org:quant-ph/9901001 and to an http: URI like
http://my.org/quant-ph/9901001

So - as far as I can see - in terms of identification, anything offered
by the use of the OAI Identifier syntax I can obtain using the http: URI
scheme.

What the use of the http: URI scheme offers in addition is that there is
a widely deployed network protocol, HTTP, which makes use of the http:
URI scheme. This means that there are widely available mechanisms
available to support the use of http: URIs not only to _identify_
resources but to retrieve representations of those identified resources.


i.e. if an OAI-PMH item has the identifier
http://my.org/quant-ph/9901001 , a consumer can make use of the existing
DNS/HTTP infrastructure to request a representation of the identified
resource from the resource owner (using the HTTP protocol), and the
resource owner can make available such a representation using that
infrastructure. The identifier of my OAI-PMH item is de-referenceable,
if you like - and easily/cheaply de-referenceable without having to
build or use any new resolution systems.

The W3C Technical Architecture Group have an excellent document on this
topic

http://www.w3.org/2001/tag/doc/URNsAndRegistries-50

(It's specifically about URI schemes and I recognise that "oai" is not a
URI scheme, but I think many of the arguments in that document are
relevant to the choice of identifiers for OAI-PMH items.)

(The document http://www.w3.org/2001/tag/doc/SchemeProtocols.html may
also be of interest. It's just an editor's draft, not a finished "TAG
finding", but I found it very helpful. It emphasises very clearly that
the HTTP protocol and the http: URI scheme are different things, that
operations on a resource identified using an http: URI may be available
using protocols other than HTTP, and the HTTP protocol may be used for
operations on resources identified by URI schemes other than the http:
URI scheme).

The OAI-PMH spec is silent on the format(s) of the representation(s) to
be made available for an OAI-PMH item - and I guess any expectation that
any single form of representation should be served would be a matter for
discussion. But it seems to me that an OAI-PMH GetRecord response for
the identified item using the oai_dc metadata format would be one
reasonable representation of the OAI-PMH item.

One important point: the OAI-PMH spec is clear that the OAI-PMH item is
a different thing from the things described by the metadata records
exposed by/extracted from an OAI-PMH item - and indeed, given the way
that the OAI-PMH protocol has been used in practice, such a metadata
record might contain descriptions of, and identifiers for, multiple
resources. So it is important to ensure that any http: URI assigned to
identify an OAI-PMH item is used to identify only that OAI-PMH item, and
is not used to identify some other resource. 

So, in short, I think it seems quite reasonable for data providers to
consider using the http: URI scheme to provide identifiers for OAI-PMH
items. It would facilitate the sort of functionality that is discussed
in the previous messages, without a dependency on additional registry
infrastructure or mappings, and it would not preclude the sort of richer
functionality that is provided by systems such as the ERRoL service
(i.e. anywhere ERRoL uses an OAI Identifier as the identifier of an
OAI-PMH item, an http: URI could be used - I think I recall seeing some
examples of http: URIs already in use in this context, but I can't find
them just now!)

Cheers

Pete
---
Pete Johnston
Technical Researcher, Eduserv Foundation
Web: http://www.eduserv.org.uk/foundation/
Email: pete.johnston at eduserv.org.uk 
Tel: +44 (0)1225 474323 



More information about the OAI-general mailing list