[OAI-implementers] Support for Tim Cole's comments

Xiaoming Liu liu_x@cs.odu.edu
Tue, 12 Feb 2002 20:37:10 -0500 (EST)


On Wed, 13 Feb 2002, Alan Kent wrote:

> 
> So 100 to 1000 records per chunk seems like a good compromise to me,


From our experience in Arc I agree that 100~1000 records is good. ">>1000"
may bring about significant memory issues considering XML parsing
overhead.

I never noticed "time-out" problem, maybe because our harvester do a
continuous harvesting, or there is no special error code for timeout in
OAI-PMH? 

> everything. OAI (in my opinion) does not do a very good job here
> yet. Because the harvester does not know the date/time stamp
> distribution of data on the source site, it is hard to automatically
> ask for multiple requests to get data from=X to=Y to get reasonable
> chunk sites (for recovery purposes). Instead, I would rather a  
> harvester be able to say 'give me everything', but be given hints
> to help with recovery in case things go wrong before finishing the
> whole transfer. (Hence my suggested optional 'there is more coming,
> but you have everything up to this date guaranteed in case you need
> to start again with a network failure.)

I think resumptionToken is current OAI's answer to this problem.

In current implementation, whenever a network failure happends, the
harvester may restart from the last successful "resumptionToken", if it's
lucky and the "resumptionToken" is still valid, everything is fine.
Otherwise it may have to harvest everything again. Of course, the
harvester may choose to harvest by date/set, it certainly will save the
finished part.

And a data provider may keep a "resumptionToken" always valid by  encoding
queries like we discussed in previous emails.

liu