[OAI-implementers] OAI identifier resolver

Adam Farquhar adam.farquhar@alumni.utexas.net
Mon, 20 Oct 2003 17:04:49 -0500

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
  <meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1">
<body text="#000000" bgcolor="#ffffff">
Selecting an approach that will be certain to fail, but unpredictably,
is not a good 'engineering' approach, especially when there are other
approaches that do not fail.&nbsp; For example, taking a base64 encoding of
the base URL or just using the base URL itself will both provide a
unique identifier.<br>
<blockquote type="cite"
  <blockquote type="cite">
    <pre wrap="">
Hash algorithms such as MD5 or CRC32 cannot be used to generate unique
identifiers.  These algorithms will occasionally produce the same output for
different input strings (this is why hash tables require a mechanism for dealing
with collisions).  Common approaches to generating unique identifiers use some
sort of a registration mechanism to appropriately partition the space of possible
values.  Successful ones will leverage an existing registration mechanism, such
as DNS.
  <pre wrap=""><!---->
I agree hash algorithm is not a "perfect" way to generate unique
identifier for a repository, but it may be acceptable in engineering
perspect, the collision possibility will be pretty low in current scale of oai data
providers (&lt;500?).

I think the basic problem is how to render OAI baseURL to a shorter,
readable string in non-collision way. The algorithm should be repeatable
-- Anyone can use same algorithm to generate same output given a baseURL.
I will be glad to see other approaches.