[OAI-implementers] OAI identifier resolver

eds@library.caltech.edu eds@library.caltech.edu
Tue, 21 Oct 2003 09:54:17 -0700

The identifiers used by OAI-PMH ought to be as rigorous and globally unique
and technically possible. Seems no reason not to.

But, maybe slightly off topic, there is a need for persistent URLs
engineered for human consumption. Here I describe a simple way to produce
persistent URLs for citations, scribbling on napkins, memory or
communicating by voice or email. It nightly grabs data from ETD-db and
Eprints.org archives into into the resolver script.

Sponsler, Ed (2001) PURR - The Persistent URL Resource Resolver.

The purpose of these URLs are to give humans something easy to remember and
communicate via voice, email, napkin, etc., they make "clean" looking
citations and are good for placing within the documents they point to. Their
persistence is backed by our institutes policy to committ to the long term
preservation of our CODA (digital archive) project.

Ed Sponsler
Caltech Library System
Pasadena, CA USA
-----Original Message-----
From: Lonnie D. Harvel [mailto:ldh@ece.gatech.edu] 
Sent: Monday, October 20, 2003 5:02 PM
To: OAI-implementers (E-mail)
Subject: Re: [OAI-implementers] OAI identifier resolver

I am in favor of just the URL:[collection name] approach.  Why make it more
complicated than necessary? URL's are unique. Is there a particular reason
why it needs to be shorter?

Adam Farquhar wrote:


Selecting an approach that will be certain to fail, but unpredictably, is
not a good 'engineering' approach, especially when there are other
approaches that do not fail.  For example, taking a base64 encoding of the
base URL or just using the base URL itself will both provide a unique


Hash algorithms such as MD5 or CRC32 cannot be used to generate
uniqueidentifiers.  These algorithms will occasionally produce the same
output fordifferent input strings (this is why hash tables require a
mechanism for dealingwith collisions).  Common approaches to generating
unique identifiers use somesort of a registration mechanism to appropriately
partition the space of possiblevalues.  Successful ones will leverage an
existing registration mechanism, suchas DNS.    
I agree hash algorithm is not a "perfect" way to generate uniqueidentifier
for a repository, but it may be acceptable in engineeringperspect, the
collision possibility will be pretty low in current scale of oai
dataproviders (<500?).I think the basic problem is how to render OAI baseURL
to a shorter,readable string in non-collision way. The algorithm should be
repeatable-- Anyone can use same algorithm to generate same output given a
baseURL.I will be glad to see other approaches.  
_______________________________________________ OAI-implementers mailing
list List information, archives, preferences and to unsubscribe: