[OAI-general] OAI identifiers / namespace

Fri Nov 3 08:00:46 EST 2006

Pete Johnston wrote:

> Tom Habing wrote:
>> You might want to also look at Jeff Young's ERROL service for 
>> a more sophisticated resolver service that uses some of the 
>> features of the UIUC registry:
>>
>> http://www.oclc.org/research/projects/oairesolver/default.htm
> 
> It does strike me that this sort of functionality would be much easier
> to provide if data providers adopted the practice of assigning URIs
> using the http: URI scheme as identifiers of OAI-PMH items (note: I'm
> talking about identifiers of OAI-PMH _items_, not identifiers of the
> resources described by metadata records exposed by/extracted from that
> item - though the http: URI scheme also works perfectly well as an
> identifier scheme for those described resources too!). 

In RFC 1738 Tim Berners Lee wrote:

"Users should beware that there is no general guarantee that a URL which
at one time points to a given object continues to do so, and does not
 even at some later time point to a different object due to the
movement of objects on servers."

In theory URLs can also be used as identifiers but in practise this is
plain stupid. There are far better URI namespaces like info: and
urn:nbn: (oai: has not been officially registered yet).

Thanks for your detailed description and references. You wrote:

> http://www.openarchives.org/OAI/2.0/guidelines-oai-identifier.htm
> 
> i.e. the assignment of "OAI Identifiers" uses the same mechanism for
> distributing the ownership of sets of identifiers as does the http: URI
> scheme i.e. they both rely on the registration of domain-names. 

Both rely on the on the registration of domain-names but that's all. OAI
Intentifiers contains more and http contains more and this "more" does
not match at all!

> And in both cases the persistence of an identifier (i.e. the notion that
> the identifier continues to identify the same resource - the same
> OAI-PMH item - over time) is dependent on
> 
> (a) the continued ownership of the domain-name by its current owner: 
> If Cornell forgets to renew its subscription for the arxiv.org
> domain-name, I can buy it and start assigning identifiers using that
> domain-name, and this week I can assign the same identifier that Cornell
> assigned last week to identify an item that disseminates metadata about
> a document on the topic of quantum slow motion to identify an item that
> disseminates metadata about a document on the topic of Sunderland
> Football Club, and I can start telling the world that that identifier
> identifies that second item. The result is that a single identifier has
> been assigned to two different resources over a period of time. That
> argument applies in the same way to both an OAI Identifier like
> oai:arXiv.org:quant-ph/9901001 and to an http: URI like
> http://arxiv.org/quant-ph/9901001

This obviously is an error in the OAI Identifier guideline. If an
institutuions changes it's domain-name (this happens frequently!) then
they still have the ownership over the oai namespace-identifier they
used before - at least the old identifiers cannot be reused.

By the way the implementation guideline says:

> Domain name registration is used to avoid the need for any additional
> registration service for oai-identifiers.

Sorry, but that's an illusion.

First oai namespace-identifiers are case-sensitive while domain names
are not. Given a domain name you still don't know the namespace identifier.

Second there is no way to determine if a given namespace-identifier has
ever been assigned or not. The existence of a matching domain name does
not tell you if the owner of the domains has even heard about OAI.

And third if you have a namespace-identifier and know that it surely is
an assigned namespace-identifier - how do you finally get the record?

Your proposal seems to solve this issue but it just does not work:

> (b) the sensible management of identifiers within/under that domain:
> If last week I assigned an identifier to an item that disseminates
> metadata about a document on the topic of quantum slow motion and told
> the world that that identifier identified that resource, then this week
> I assign that same identifier to an item that disseminates metadata
> about a document on the topic of Sunderland Football Club and tell the
> world that that identifier identifies that item, then again the same
> identifier has been assigned to two different resources over a period of
> time, in this case, by a single agency.  And again, that argument
> applies in the same way to both an OAI Identifier like
> oai:my.org:quant-ph/9901001 and to an http: URI like
> http://my.org/quant-ph/9901001

If you assign the same identifier to one resource in one week and to
another resource in the next week then it's not a identifier anymore.
Identifiers are only assigned once in a lifetime. I know that there are
some ISBN assigned twice but this is against the whole idea of
persistent identifiers.

It's less a technical problem but a social problem. In theory we don't
need URI, URN and all that stuff if people would not change their URLs
every week. But that's life.

> So - as far as I can see - in terms of identification, anything offered
> by the use of the OAI Identifier syntax I can obtain using the http: URI
> scheme.

In theory yes, but not in this world.

> i.e. if an OAI-PMH item has the identifier
> http://my.org/quant-ph/9901001 , a consumer can make use of the existing
> DNS/HTTP infrastructure to request a representation of the identified
> resource from the resource owner (using the HTTP protocol), and the
> resource owner can make available such a representation using that
> infrastructure. 

And next week the manager of my.org (who does not care about OAI) buys a
content management system so all your assigned identifiers produce a 404.

And next month the my.org company is sold or changes its name so the
domain name has to change because of branding issues.

And next year the my.org company is bancrupt (or it was a funded project
and funding is over). Luckily a library is allowed to archive all the
metadata - of course the library won't buy the my.org domain.

I fear that the term "identifier" is used in a very lax way the more
popular it gets.

Greetings,
Jakob