[OAI-implementers] OAI identifier resolver

Patrick CH Hochstenbach Patrick Hochstenbach <hochsten@lanl.gov>
Mon, 20 Oct 2003 21:39:17 -0600 (MDT)


A MD5 hash is an incredible safe hash to use. It would take you
2^64 unique identifiers to come up with a collision. This will take
you about 585 years if you allow for a creation of about 1,000,000,000 
identifiers per second.

Patrick

On Mon, 20 Oct 2003, Xiaoming Liu wrote:

> 
> On Mon, 20 Oct 2003, Lonnie D. Harvel wrote:
> 
> >
> > I am in favor of just the URL:[collection name] approach.  Why make it
> > more complicated than necessary? URL's are unique. Is there a particular
> > reason why it needs to be shorter?
> 
> This is back to the problem why we need a resolver. If both baseURL and
> record identifier are supplied, it doesn't make a lot sense to develop a
> resolver. I think the motivation is to provide a "cool" URL for each
> record, and make it easy to exchange information by REST model.
> 
> OAI has no centralized mechanism to maintain unique repository name, it's
> either done by one centralized registry -- like UIUC registry, or done
> by a distributed way -- like hashing baseURL or other better ways. In the
> distributed way, I can add a link to Purl-OAI resolver without prior
> knowledge of how repository name is maintained in Purl-OAI resolver.
> That's my reason of favoring distributed method.
> 
> xiaoming
> 
> 
> 
> 
> 
> >
> > Adam Farquhar wrote:
> >
> > > Xiaoming,
> > >
> > > Selecting an approach that will be certain to fail, but unpredictably,
> > > is not a good 'engineering' approach, especially when there are other
> > > approaches that do not fail.  For example, taking a base64 encoding of
> > > the base URL or just using the base URL itself will both provide a
> > > unique identifier.
> > >
> > > Adam.
> > >
> > >>>Hash algorithms such as MD5 or CRC32 cannot be used to generate unique
> > >>>identifiers.  These algorithms will occasionally produce the same output for
> > >>>different input strings (this is why hash tables require a mechanism for dealing
> > >>>with collisions).  Common approaches to generating unique identifiers use some
> > >>>sort of a registration mechanism to appropriately partition the space of possible
> > >>>values.  Successful ones will leverage an existing registration mechanism, such
> > >>>as DNS.
> > >>>
> > >>>
> > >>
> > >>I agree hash algorithm is not a "perfect" way to generate unique
> > >>identifier for a repository, but it may be acceptable in engineering
> > >>perspect, the collision possibility will be pretty low in current scale of oai data
> > >>providers (<500?).
> > >>
> > >>I think the basic problem is how to render OAI baseURL to a shorter,
> > >>readable string in non-collision way. The algorithm should be repeatable
> > >>-- Anyone can use same algorithm to generate same output given a baseURL.
> > >>I will be glad to see other approaches.
> > >>
> > >>
> > >>
> > > _______________________________________________ OAI-implementers
> > > mailing list List information, archives, preferences and to
> > > unsubscribe:
> > > http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers
> >
> >
> >
> 
> _______________________________________________
> OAI-implementers mailing list
> List information, archives, preferences and to unsubscribe:
> http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers
> 

-- 
 Patrick Hochstenbach  -------------------,   ,==.    ,----------   PO Box 1663 
 Los Alamos National Laboratory          /   /@  |   /               Los Alamos 
 Research Library, MS P362              /   /_  <   /     New Mexico 87544-7113 
 +1 (505) 665 1475 -------------------'    =" `g'  '--------  hochsten@lanl.gov