[OAI-implementers] Better resumption mechanism - more importa nt than ever!

Xiaoming Liu liu_x@cs.odu.edu
Tue, 5 Mar 2002 14:40:58 -0500 (EST)


On Tue, 5 Mar 2002, Young,Jeff wrote:

> 
> I'd be happy to implement stateless resumptionTokens, but unless harvesters
> know how to use them for recovery, why bother? How many harvesters today
> could manage a recovery using stateless resumptionTokens? How many
> harvesters will handle it tomorrow if OAI remains agnostic on the issue?

I guess this is a major issue that harvester should follow a certain
policy. In current implementation in Arc, for each failed request, the
harvester will try at most three times using the same http request. And it
will give up after that. This policy really helps several times, but not
too often ;-)

Ideally, I guess a harvester could use exponential backoff algorithm to
keep trying until the resumptionToken is expired (Considering a
time-to-live parameter will be added in 2.0). And if we implment the
harvester in a multiple process/thread way, the system should scale well
for several resumptionToken errors.

I think something like "implementation guide" or "reference
implementation" will help harvester and DP understand each other well
beyond the core protocol.

regards,
liu



 

> 
> I'm sure ETDCat needs more stress testing to minimize future failures. The
> fact that we've discussed this before, though, indicates a recognition that
> problems can happen. I don't blame Alan if he doesn't want to negotiate
> special rules for harvesting ETDCat merely because the risk is proportional
> to the size of the repository.
> 
> Jeff
> 
> -----Original Message-----
> From: Michael L. Nelson [mailto:mln@ils.unc.edu]
> Sent: Tuesday, March 05, 2002 10:03 AM
> To: 'Alan Kent'
> Cc: OAI Implementors
> Subject: Re: [OAI-implementers] Better resumption mechanism - more
> important than ever!
> 
> 
> 
> actually, the way I see it is the protocol should not be complicated with
> additional tokens and such to enforce what ETDCat (and similiarly
> large-sized DPs) should do:
> 
> 1.  partition their collection into sets
> 2.  use stateless (or very long lived) resumptionTokens
> 
> in 2.0, resumptionTokens will have optional attributes, including
> "expirationDate", so this will take the guess work out of knowing how long
> a resumptionToken will be valid.
> 
> IMO, introducing an optional restartToken is no different (from an
> implementer's point of view) than making the resumptionToken last a long
> time.  
> 
> at some point, you (as a harvester) are simply at the mercy of the
> repository.  new features in the protocol won't change that.
> 
> regards,
> 
> Michael
> 
> On Tue, 5 Mar 2002, 'Alan Kent' wrote:
> 
> > I just got some mail from Jeff at OCLC talking about ETDCat (hope
> > you don't mind me quoting some of your mail Jeff). In particular,
> > he just told me
> > 
> >     ETDCat contains a lot of records (over 4 million), all of
> >     which currently have the exact same datestamp from the initial load.
> > 
> > He also told me that there were no sets. So basically, its all
> > or nothing for this site because OAI has no standard way to resume
> > if a transfer fails.
> > 
> > If this has happened already, I think its likely to occur again.
> > (That is, one very large database all with the same time stamp.)
> > So any comments about having a single large collection like this
> > is beside the point. The point is OAI does not handle it well.
> > 
> > So I would like to resurrect the discussion again if people don't
> > mind on how to do support restarts. My understanding of the general
> > feeling so far is
> > 
> > (1) Mandating support is not going to be acceptable
> > 
> > (2) Mandating format of resumption tokens is not going to be acceptable
> > 
> > (3) Mandating resumption tokens be long lifed (eg: can try again the
> >     following day) is not acceptable
> > 
> > (4) In fact, mandating that resumption tokens be unique (allowing
> >     a token to be reused twice in quick succession to get the same
> >     data) is not acceptable
> > 
> > So any proposal needs to be optionally supported.
> > 
> > Question time:
> > 
> > Does anyone else think that this is a major hole in OAI? I personally
> > do. After trying to crawl sites, things go wrong. The larger the site,
> > the greater the probability that something will go wrong. The larger
> > the site, the greater the pain of starting all over again. I do not
> > think it is practical for anyone to harvest ETDCat if is really got
> > 4,000,000 records. Any fault, and start downloading that 4gb again!
> > So I feel strongly on this one. In fact, I think this is the most
> > major problem OAI has.
> > 
> > Do people think its better to reuse resumption tokens for this purpose,
> > or introduce a different sort of token? ETDCat for example I think
> > allocates a session id in resumption tokens, meaning they cannot
> > be reused when the session times out in the server (similar semantics
> > anyway). This is a reasonable implementation decision to make.
> > So maybe its better for servers to return an additional token,
> > which is a <restartToken> which means a client can instead of
> > specifying from= and to= again, specify restartToken= instead where
> > the server then automatically works out whatever other parameters
> > it needs, creates a new session etc internally. The new 'session'
> > (ListXXX verb) then can use resumptionTokens to manage that new
> > transfer.
> > 
> > The idea is for a <restartToken> to be long lifed. It may be less
> > efficient to use than a resumptionToken, but its only purpose is
> > if the client fails the download. If a server does not support
> > restartToken, it simply never returns one. Large collections *should*
> > support restartTokens.
> > 
> > For my harvester, I can then remember (to disk) the restartToken for
> > every packet I get back, allowing me to recover much more easily
> > if anything crashes. If restartToken's are too hard for someone
> > to implement, then they don't. If you have a large data collection
> > on the other hand, to reduce network load, I think its probably worth
> > the extra effort of supporting restartTokens.
> > 
> > Any comments? Better suggesions?
> > 
> > Alan
> > _______________________________________________
> > OAI-implementers mailing list
> > OAI-implementers@oaisrv.nsdl.cornell.edu
> > http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers
> > 
> 
> ---
> Michael L. Nelson
> NASA Langley Research Center		m.l.nelson@larc.nasa.gov
> MS 158, Hampton, VA 23681		http://www.ils.unc.edu/~mln/
> +1 757 864 8511				+1 757 864 8342 (f)
> 
> 
> _______________________________________________
> OAI-implementers mailing list
> OAI-implementers@oaisrv.nsdl.cornell.edu
> http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers
> _______________________________________________
> OAI-implementers mailing list
> OAI-implementers@oaisrv.nsdl.cornell.edu
> http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers
>