[OAI-implementers] Better resumption mechanism - more important than ever!

Wed, 6 Mar 2002 17:44:12 +1100

On Tue, Mar 05, 2002 at 09:53:54PM -0500, Xiaoming Liu wrote:
> 1) Could the resumptionToken (in your case restartToken) be re-used? 
> 
> I agree the retry algorithem is theoretically unsafe in current protocol,
> thanks. However, the same question also exists in "restartToken" and 
> must be addressed before we talk about question 2. If they can not be
> re-used, the harvester has to start from scratch. It looks like the OAI
> 1.1 doesn't give clear answer to this question. Hopefully it
> could be answsered in 2.0

*If* restartToken was introduced, it would be idempotent-ish. Its whole
purpose would be to allow it to be reissued and old results come back.
I would define it as "returning all of the records not returned so
far in the current transfer, possibly including other records that
have already been transferred".

> 2) If it is legal to re-use, should we introduce a restartToken concept?
> 
> My personal opinion is restartToken will bring too much complexity, and
> it's not necessary.

I agree the concensus is to stick to resumptionToken's. That's fine.
Just pushing restartToken to see what problems/issues arrise.

I would therefore instead propose that there be a standard way in
the Identify response to say 'resumptionToken's are idempotent'
and also 'resumptionToken's can be rerequested' in case of network
failure. DP implementors *should* also try to make them long lifed
(days to weeks) for large repositories.

> In your case, I could imagine it can be done by current OAI 
> resumptionToken: assume the proposed tokens in your suggestion are called
> alan_restartToken and  alan_resumptionToken respectively. 
> 
> oai_resumptionToken=alan_restartToken + alan_resumptionToken
> 
> So data provider (DP) can always parse the oai_resumptionToken, in most
> case, the session is valid and DP just uses alan_resumptionToken; if
> anything goes wrong, DP need redo the query, DP have the freedom to use
> the alan_restartToken. The harvester should not know what happens behind
> the scene. At this scenario, the time-to-live could be month, year ;-)

That works. I can use it with my Z39.50 result set/query. If the
result set is still around, reuse it. If it has timed out, redo the
query. I would not be able to return the number of records (when
redoing the query, the number might change), but overall things
would work.

Alan