[OAI-implementers] Inadequacy of Datestamps + repository IDs

Tim Brody tim@tim.brody.btinternet.co.uk
Tue, 5 Mar 2002 13:38:33 -0000


Dear all,

Datestamps:

Following on from Alan's email regarding datestamps, I would add another
datestamp show-stopper:
Federators

My current understanding of how to build hierachical harvesting is to change
the record datestamp to the day of harvest. This means if a repository of
100,000 records is harvested, that will be 100,000 similarly dated records.

May I suggest the following (which would seem in keeping with OAI's aim of
DP flexibility):
Relax "datestamp" to be any positive number (not necessarily exclusive),
which the repository must be capable of applying a "from" and "to" filter.
New and changed records must have a number greater than the last record in
the repository.

If a repository can be required to provide records in "datestamp" order,
resuming from a broken response is simple (continue from last index
inclusive).

As a datestamp is already an ordered number, this won't lose anything, but
will enable repositories where a datestamp makes no sense to provide
incremental harvesting.

repository IDs:

Following from my email a while back, it seems the discussion on UIDing
repositories has come up with central vs. distributed listings. Simeon
Warner responded with:

> I don't see harm in allowing also '-' and '.'. (I wouldn't want to make it
> case insensitive.) However, without some enforceable policy about naming
> (avoiding the need for OAI registration which currently solves the
> uniqueness problem) does this really buy us anything? After all, I could
> use identifiers "http://arXiv.org/abs/hep-th/9901001" and such for arXiv
> but I choose to use the simpler oai scheme "oai:arXiv:hep-th/9901001".

I can't see a flat, short, centralised naming mechanism working in the
long-run. XML namespaces are UIDed by URL (which is guaranteed to be unique
in an honest system). Assuming that bandwidth isn't a problem, then it makes
sense to have identifiers based on some kind of URL structure - why restrict
the length of identifiers?:
oai:arXiv.org/oai1:hep-th/0001001
oai:cogprints.soton.ac.uk/perl/oai:cogprints/00001111

Another thought about naming authorities: What happens if someone sues for
trademark infringement (a univeristy archive on microsoft legal proceedings
called "oai:microsoft/...")? If archive's are based on DNS, they already
have well-established mechanisms for coping with name conflicts.


All the best,
Tim Brody