[OAI-implementers] Better resumption mechanism - more importa nt than ever!

Xiaoming Liu liu_x@cs.odu.edu
Tue, 5 Mar 2002 17:04:18 -0500 (EST)


On Tue, 5 Mar 2002, Simeon Warner wrote:

> 
> Liu, your policy is the sort of thing I had imagined. However, I'm curious
> about how frequently you find that a sequence of harvests fails. When I
> last did an extensive harvest (last summer) I found that, provided
> repositories had implemented the protocol properly, I rarely had problems
> getting successful responses to complete a List request. Can you give us
> some (approximate) statistics?


I did the last extensive (historical) harvest from Feb,5. Usually this is
not a problem in our daily (fresh) harvesting. The result may be the
problem in my side.

Five archives failed in a sequence of harvesting (using resumptionToken),
including: cimi, etdcat, conoze, dlpscoll, hsss. I did not dig into the
details however. See harvest log below:

etdcat: 
http://alcme.oclc.org/etdcat/servlet/OAIHandler?verb=ListRecords&resumptionToken=1013006344644%3A22000%3Aoai_dc
Wed Feb 06 09:40:20 EST 2002
java.net.SocketException: errno: 101, error: Network is unreachable for
fd: 11
java.net.SocketException: errno: 101, error: Network is unreachable for
fd: 11/

hsss:

http://hsss.slub-dresden.de/hsss/servlet/hsss.oai.OAIServlet?verb=ListRecords&resumptionToken=rt10130378386093615p9
Wed Feb 06 18:19:51 EST 2002
java.io.IOException: Server returned HTTP response code: 400 for URL:
http://hsss.slub-dresden.de/hsss/servlet/hsss.oai.OAIServlet?verb=ListRecords&resumptionToken=rt10130378386093615p9

cimi:

http://www.cimi.org/servlet/oai?verb=ListRecords&resumptionToken=5fs9fuzq3mcj471dredxulbv2rjfihmg
Thu Feb 07 01:00:56 EST 2002
java.io.IOException: Server returned HTTP response code: 400 for URL:
http://www.cimi.org/servlet/oai?verb=ListRecords&resumptionToken=5fs9fuzq3mcj471dredxulbv2rjfihmg
responsecode:400

dlpscoll:
http://www.hti.umich.edu/cgi/b/broker/broker?verb=ListRecords&resumptionToken=evd-bib,2200-01-01,1800-01-01,oai_dc,evd-bib,evd-bib.dd,200,
Thu Feb 07 06:38:05 EST 2002
java.io.IOException: Server returned HTTP response code: 500 for URL:
http://www.hti.umich.edu/cgi/b/broker/broker?verb=ListRecords&resumptionToken=evd-bib,2200-01-01,1800-01-01,oai_dc,evd-bib,evd-bib.dd,200,
responsecode:500

conoze:
http://www.conoze.com/interfaz/oai/index.php?verb=ListRecords&resumptionToken=50
Sun Mar 03 07:34:50 EST 2002
java.io.IOException: Server returned HTTP response code: 400 for URL:
http://www.conoze.com/interfaz/oai/index.php?verb=ListRecords&resumptionToken=50


regards,
liu



>  
> > Ideally, I guess a harvester could use exponential backoff algorithm to
> > keep trying until the resumptionToken is expired (Considering a
> > time-to-live parameter will be added in 2.0). And if we implment the
> > harvester in a multiple process/thread way, the system should scale well
> > for several resumptionToken errors.
> > 
> > I think something like "implementation guide" or "reference
> > implementation" will help harvester and DP understand each other well
> > beyond the core protocol.
> 
> Yes, this should certainly be covered in the implementation guidelines.
> 
> Cheers,
> Simeon.
>  
> 
> > regards,
> > liu
> > > 
> > > I'm sure ETDCat needs more stress testing to minimize future failures. The
> > > fact that we've discussed this before, though, indicates a recognition that
> > > problems can happen. I don't blame Alan if he doesn't want to negotiate
> > > special rules for harvesting ETDCat merely because the risk is proportional
> > > to the size of the repository.
> > > 
> > > Jeff
> > > 
> > > -----Original Message-----
> > > From: Michael L. Nelson [mailto:mln@ils.unc.edu]
> > > Sent: Tuesday, March 05, 2002 10:03 AM
> > > To: 'Alan Kent'
> > > Cc: OAI Implementors
> > > Subject: Re: [OAI-implementers] Better resumption mechanism - more
> > > important than ever!
> > > 
> > > actually, the way I see it is the protocol should not be complicated with
> > > additional tokens and such to enforce what ETDCat (and similiarly
> > > large-sized DPs) should do:
> > > 
> > > 1.  partition their collection into sets
> > > 2.  use stateless (or very long lived) resumptionTokens
> > > 
> > > in 2.0, resumptionTokens will have optional attributes, including
> > > "expirationDate", so this will take the guess work out of knowing how long
> > > a resumptionToken will be valid.
> > > 
> > > IMO, introducing an optional restartToken is no different (from an
> > > implementer's point of view) than making the resumptionToken last a long
> > > time.  
> > > 
> > > at some point, you (as a harvester) are simply at the mercy of the
> > > repository.  new features in the protocol won't change that.
> > > 
> > > regards,
> > > 
> > > Michael
> > > 
> > > On Tue, 5 Mar 2002, 'Alan Kent' wrote:
> > > 
> > > > I just got some mail from Jeff at OCLC talking about ETDCat (hope
> > > > you don't mind me quoting some of your mail Jeff). In particular,
> > > > he just told me
> > > > 
> > > >     ETDCat contains a lot of records (over 4 million), all of
> > > >     which currently have the exact same datestamp from the initial load.
> > > > 
> > > > He also told me that there were no sets. So basically, its all
> > > > or nothing for this site because OAI has no standard way to resume
> > > > if a transfer fails.
> > > > 
> > > > If this has happened already, I think its likely to occur again.
> > > > (That is, one very large database all with the same time stamp.)
> > > > So any comments about having a single large collection like this
> > > > is beside the point. The point is OAI does not handle it well.
> > > > 
> > > > So I would like to resurrect the discussion again if people don't
> > > > mind on how to do support restarts. My understanding of the general
> > > > feeling so far is
> > > > 
> > > > (1) Mandating support is not going to be acceptable
> > > > 
> > > > (2) Mandating format of resumption tokens is not going to be acceptable
> > > > 
> > > > (3) Mandating resumption tokens be long lifed (eg: can try again the
> > > >     following day) is not acceptable
> > > > 
> > > > (4) In fact, mandating that resumption tokens be unique (allowing
> > > >     a token to be reused twice in quick succession to get the same
> > > >     data) is not acceptable
> > > > 
> > > > So any proposal needs to be optionally supported.
> > > > 
> > > > Question time:
> > > > 
> > > > Does anyone else think that this is a major hole in OAI? I personally
> > > > do. After trying to crawl sites, things go wrong. The larger the site,
> > > > the greater the probability that something will go wrong. The larger
> > > > the site, the greater the pain of starting all over again. I do not
> > > > think it is practical for anyone to harvest ETDCat if is really got
> > > > 4,000,000 records. Any fault, and start downloading that 4gb again!
> > > > So I feel strongly on this one. In fact, I think this is the most
> > > > major problem OAI has.
> > > > 
> > > > Do people think its better to reuse resumption tokens for this purpose,
> > > > or introduce a different sort of token? ETDCat for example I think
> > > > allocates a session id in resumption tokens, meaning they cannot
> > > > be reused when the session times out in the server (similar semantics
> > > > anyway). This is a reasonable implementation decision to make.
> > > > So maybe its better for servers to return an additional token,
> > > > which is a <restartToken> which means a client can instead of
> > > > specifying from= and to= again, specify restartToken= instead where
> > > > the server then automatically works out whatever other parameters
> > > > it needs, creates a new session etc internally. The new 'session'
> > > > (ListXXX verb) then can use resumptionTokens to manage that new
> > > > transfer.
> > > > 
> > > > The idea is for a <restartToken> to be long lifed. It may be less
> > > > efficient to use than a resumptionToken, but its only purpose is
> > > > if the client fails the download. If a server does not support
> > > > restartToken, it simply never returns one. Large collections *should*
> > > > support restartTokens.
> > > > 
> > > > For my harvester, I can then remember (to disk) the restartToken for
> > > > every packet I get back, allowing me to recover much more easily
> > > > if anything crashes. If restartToken's are too hard for someone
> > > > to implement, then they don't. If you have a large data collection
> > > > on the other hand, to reduce network load, I think its probably worth
> > > > the extra effort of supporting restartTokens.
> > > > 
> > > > Any comments? Better suggesions?
> > > > 
> > > > Alan
> > > > _______________________________________________
> > > > OAI-implementers mailing list
> > > > OAI-implementers@oaisrv.nsdl.cornell.edu
> > > > http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers
> > > > 
> > > 
> > > ---
> > > Michael L. Nelson
> > > NASA Langley Research Center		m.l.nelson@larc.nasa.gov
> > > MS 158, Hampton, VA 23681		http://www.ils.unc.edu/~mln/
> > > +1 757 864 8511				+1 757 864 8342 (f)
> > > 
> > > 
> > > _______________________________________________
> > > OAI-implementers mailing list
> > > OAI-implementers@oaisrv.nsdl.cornell.edu
> > > http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers
> > > _______________________________________________
> > > OAI-implementers mailing list
> > > OAI-implementers@oaisrv.nsdl.cornell.edu
> > > http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers
> > > 
> > 
> > _______________________________________________
> > OAI-implementers mailing list
> > OAI-implementers@oaisrv.nsdl.cornell.edu
> > http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers
> > 
> 
> _______________________________________________
> OAI-implementers mailing list
> OAI-implementers@oaisrv.nsdl.cornell.edu
> http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers
>