[OAI-implementers] Inadequacy of Datestamps + repository IDs

Michael L. Nelson mln@ils.unc.edu
Tue, 5 Mar 2002 10:29:36 -0500 (EST)


Tim,

datestamps:

2.0 will require at least second level granularity in the protocol
responses (decimal fractions of a second are optional).  aggregators
should then allow harvesting on at least second level granularity.  this
will result in the monotonically increasing values that you suggest below,
since its unlikely that a repository can load many thousands of records
in a single second (or 1/10th of a second).  and if it can load that many
per second, it shold slow down -- there's no rush ;-).

repositories will have the option of what level of time granularity they
support for harvesting, but obviously large volume federators will want to
support fine grained harvesting.  there will be mechanisms for determining
what granularity a repository supports.

repository ids:

I agree; an approach that piggy-backs on DNS names will probably be a good
one.  I'm not sure we can *require* people to use DNS names, but they will
probably migrate to it just to insure uniqueness.

regards,

Michael

On Tue, 5 Mar 2002, Tim Brody wrote:

> Dear all,
> 
> Datestamps:
> 
> Following on from Alan's email regarding datestamps, I would add another
> datestamp show-stopper:
> Federators
> 
> My current understanding of how to build hierachical harvesting is to change
> the record datestamp to the day of harvest. This means if a repository of
> 100,000 records is harvested, that will be 100,000 similarly dated records.
> 
> May I suggest the following (which would seem in keeping with OAI's aim of
> DP flexibility):
> Relax "datestamp" to be any positive number (not necessarily exclusive),
> which the repository must be capable of applying a "from" and "to" filter.
> New and changed records must have a number greater than the last record in
> the repository.
> 
> If a repository can be required to provide records in "datestamp" order,
> resuming from a broken response is simple (continue from last index
> inclusive).
> 
> As a datestamp is already an ordered number, this won't lose anything, but
> will enable repositories where a datestamp makes no sense to provide
> incremental harvesting.
> 
> repository IDs:
> 
> Following from my email a while back, it seems the discussion on UIDing
> repositories has come up with central vs. distributed listings. Simeon
> Warner responded with:
> 
> > I don't see harm in allowing also '-' and '.'. (I wouldn't want to make it
> > case insensitive.) However, without some enforceable policy about naming
> > (avoiding the need for OAI registration which currently solves the
> > uniqueness problem) does this really buy us anything? After all, I could
> > use identifiers "http://arXiv.org/abs/hep-th/9901001" and such for arXiv
> > but I choose to use the simpler oai scheme "oai:arXiv:hep-th/9901001".
> 
> I can't see a flat, short, centralised naming mechanism working in the
> long-run. XML namespaces are UIDed by URL (which is guaranteed to be unique
> in an honest system). Assuming that bandwidth isn't a problem, then it makes
> sense to have identifiers based on some kind of URL structure - why restrict
> the length of identifiers?:
> oai:arXiv.org/oai1:hep-th/0001001
> oai:cogprints.soton.ac.uk/perl/oai:cogprints/00001111
> 
> Another thought about naming authorities: What happens if someone sues for
> trademark infringement (a univeristy archive on microsoft legal proceedings
> called "oai:microsoft/...")? If archive's are based on DNS, they already
> have well-established mechanisms for coping with name conflicts.
> 
> 
> All the best,
> Tim Brody
> 
> _______________________________________________
> OAI-implementers mailing list
> OAI-implementers@oaisrv.nsdl.cornell.edu
> http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers
> 

---
Michael L. Nelson
NASA Langley Research Center		m.l.nelson@larc.nasa.gov
MS 158, Hampton, VA 23681		http://www.ils.unc.edu/~mln/
+1 757 864 8511				+1 757 864 8342 (f)