FW: [OAI-implementers] Open Archives Initiative Protocol for Meta data Harvesting Version 2 news

Walter Underwood wunder@inktomi.com
Thu, 07 Feb 2002 15:06:48 -0800


Replying to two related messages ...

--On Thursday, February 07, 2002 11:14:30 PM +0100 Martin Vesely <Martin.Vesely@cern.ch> wrote:
> 
> The described way of caching data is very similar to how the OAI flow
> control is done in our repository. But still, I do not see how we can
> get rid of resumption tokens.

A client can request elements 21-30 of a list, and get a response.
That might be the very first request from that client. Or the first
request after the server reboots. It could even go to a replica of
the server. No resumption token is needed. Calculate the list, and 
return that portion of it.

Here is a URL to get hits 21-30 about "face on mars" from the NASA
search engine. No need to fetch hits 1-20 and get a resumption token.
You can edit the "st" variable to change the start hit.

http://search.spacelink.nasa.gov/query.html?col=library+xreflib&qt=face+on+mars&st=21&nh=10

--On Wednesday, February 06, 2002 01:59:55 PM -0500 "Young,Jeff" <jyoung@oclc.org> wrote:
> 
> I guess I'm saying that resumptionTokens don't necessarily guarantee you'll
> get "all the new stuff", but could if appropriately implemented. The
> stateless alternative, though, seems to assume an idealistically static
> repository. If records are deleted from the repository, a stateless
> harvesting solution doesn't seem to allow for the possibility of getting all
> the new stuff. 

A request for all changes between two dates in the past should always get 
the same answer, so stateless harvesting should work. A half-open request, 
that is "until now", will have time-varying results. If harvesters always
make requests with both from and until, and make sure that the until date
is not in the future, then stateless harvesting is safe.

There should be some way to get the current time at the repository.
Clock skew will cause nasty problems in time-based harvesting. The only
safe solution is to always use the clock at the server, and to require
that it is non-decreasing.

wunder
--
Walter Underwood
wunder@inktomi.com
Senior Staff Engineer, Inktomi
http://www.inktomi.com/