FW: [OAI-implementers] Open Archives Initiative Protocol for Meta data Harvesting Version 2 news

Xiaoming Liu liu_x@cs.odu.edu
Thu, 07 Feb 2002 21:50:27 -0500


--------------A07397DD47F45319B435C394
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Maybe there is one way to implement a stateless protocol in current OAI: encode query parameters
in ResumptionToken:

For example, the format of resumptionToken could be
from:to:Sets:MetadataSet:cursor

one example is:
resumptionToken= 1999:2000:math:oai_dc:100

By this way,  the state information is kept in resumptionToken, the data provider doesn't need
keep it. I have seen some implementations in this way.


--- Walter Underwood wrote:
> A request for all changes between two dates in the past should always get
> the same answer, so stateless harvesting should work.

This is a neat way, but I am now sure how well the past is kept in digital library ;-) Especially
in OAI protocol, whenever a record is changed, its datestamp is changed too.  So even a request
for past may not get the same answer.

regards,
liu








Walter Underwood wrote:

> Replying to two related messages ...
>
> --On Thursday, February 07, 2002 11:14:30 PM +0100 Martin Vesely <Martin.Vesely@cern.ch> wrote:
> >
> > The described way of caching data is very similar to how the OAI flow
> > control is done in our repository. But still, I do not see how we can
> > get rid of resumption tokens.
>
> A client can request elements 21-30 of a list, and get a response.
> That might be the very first request from that client. Or the first
> request after the server reboots. It could even go to a replica of
> the server. No resumption token is needed. Calculate the list, and
> return that portion of it.
>
> Here is a URL to get hits 21-30 about "face on mars" from the NASA
> search engine. No need to fetch hits 1-20 and get a resumption token.
> You can edit the "st" variable to change the start hit.
>
> http://search.spacelink.nasa.gov/query.html?col=library+xreflib&qt=face+on+mars&st=21&nh=10
>
> --On Wednesday, February 06, 2002 01:59:55 PM -0500 "Young,Jeff" <jyoung@oclc.org> wrote:
> >
> > I guess I'm saying that resumptionTokens don't necessarily guarantee you'll
> > get "all the new stuff", but could if appropriately implemented. The
> > stateless alternative, though, seems to assume an idealistically static
> > repository. If records are deleted from the repository, a stateless
> > harvesting solution doesn't seem to allow for the possibility of getting all
> > the new stuff.
>
> A request for all changes between two dates in the past should always get
> the same answer, so stateless harvesting should work. A half-open request,
> that is "until now", will have time-varying results. If harvesters always
> make requests with both from and until, and make sure that the until date
> is not in the future, then stateless harvesting is safe.
>
> There should be some way to get the current time at the repository.
> Clock skew will cause nasty problems in time-based harvesting. The only
> safe solution is to always use the clock at the server, and to require
> that it is non-decreasing.
>
> wunder
> --
> Walter Underwood
> wunder@inktomi.com
> Senior Staff Engineer, Inktomi
> http://www.inktomi.com/
>
> _______________________________________________
> OAI-implementers mailing list
> OAI-implementers@oaisrv.nsdl.cornell.edu
> http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers

--------------A07397DD47F45319B435C394
Content-Type: text/html; charset=us-ascii
Content-Transfer-Encoding: 7bit

<!doctype html public "-//w3c//dtd html 4.0 transitional//en">
<html>
Maybe there is one way to implement a stateless protocol in current OAI:
encode query parameters in ResumptionToken:
<p>For example, the format of resumptionToken could be
<br>from:to:Sets:MetadataSet:cursor
<p>one example is:
<br>resumptionToken= 1999:2000:math:oai_dc:100
<p>By this way,&nbsp; the state information is kept in resumptionToken,
the data provider doesn't need keep it. I&nbsp;have seen some implementations
in this way.
<br>&nbsp;
<p>--- Walter Underwood wrote:
<br>> A request for all changes between two dates in the past should always
get
<br>> the same answer, so stateless harvesting should work.
<p>This is a neat way, but I am now sure how well the past is kept in digital
library ;-) Especially in OAI&nbsp;protocol, whenever a record is changed,
its datestamp is changed too.&nbsp; So even a request for past may not
get the same answer.
<p>regards,
<br>liu
<br>&nbsp;
<br>&nbsp;
<br>&nbsp;
<br>&nbsp;
<br>&nbsp;
<br>&nbsp;
<br>&nbsp;
<p>Walter Underwood wrote:
<blockquote TYPE=CITE>Replying to two related messages ...
<p>--On Thursday, February 07, 2002 11:14:30 PM +0100 Martin Vesely &lt;Martin.Vesely@cern.ch>
wrote:
<br>>
<br>> The described way of caching data is very similar to how the OAI
flow
<br>> control is done in our repository. But still, I do not see how we
can
<br>> get rid of resumption tokens.
<p>A client can request elements 21-30 of a list, and get a response.
<br>That might be the very first request from that client. Or the first
<br>request after the server reboots. It could even go to a replica of
<br>the server. No resumption token is needed. Calculate the list, and
<br>return that portion of it.
<p>Here is a URL to get hits 21-30 about "face on mars" from the NASA
<br>search engine. No need to fetch hits 1-20 and get a resumption token.
<br>You can edit the "st" variable to change the start hit.
<p><a href="http://search.spacelink.nasa.gov/query.html?col=library+xreflib&qt=face+on+mars&st=21&nh=10">http://search.spacelink.nasa.gov/query.html?col=library+xreflib&amp;qt=face+on+mars&amp;st=21&amp;nh=10</a>
<p>--On Wednesday, February 06, 2002 01:59:55 PM -0500 "Young,Jeff" &lt;jyoung@oclc.org>
wrote:
<br>>
<br>> I guess I'm saying that resumptionTokens don't necessarily guarantee
you'll
<br>> get "all the new stuff", but could if appropriately implemented.
The
<br>> stateless alternative, though, seems to assume an idealistically
static
<br>> repository. If records are deleted from the repository, a stateless
<br>> harvesting solution doesn't seem to allow for the possibility of
getting all
<br>> the new stuff.
<p>A request for all changes between two dates in the past should always
get
<br>the same answer, so stateless harvesting should work. A half-open request,
<br>that is "until now", will have time-varying results. If harvesters
always
<br>make requests with both from and until, and make sure that the until
date
<br>is not in the future, then stateless harvesting is safe.
<p>There should be some way to get the current time at the repository.
<br>Clock skew will cause nasty problems in time-based harvesting. The
only
<br>safe solution is to always use the clock at the server, and to require
<br>that it is non-decreasing.
<p>wunder
<br>--
<br>Walter Underwood
<br>wunder@inktomi.com
<br>Senior Staff Engineer, Inktomi
<br><a href="http://www.inktomi.com/">http://www.inktomi.com/</a>
<p>_______________________________________________
<br>OAI-implementers mailing list
<br>OAI-implementers@oaisrv.nsdl.cornell.edu
<br><a href="http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers">http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers</a></blockquote>
</html>

--------------A07397DD47F45319B435C394--