[OAI-implementers] resumptionToken Implementation

Hussein Suleman hussein at cs.uct.ac.za
Tue Sep 28 13:24:28 EDT 2004


hi

in terms of reshuffling, if an object is resent, that is ok - the 
harvester should assume it is a more up-to-date version. what is more 
serious is if an object not yet sent is shuffled into a previously-sent 
position ... hopefully a repository is designed such that no items ever 
move "back" in the logical sequencing, to prevent this problem occurring 
in repositories with linear sequences of chunks associated with 
resumptionTokens. shuffling is simply not a good idea :)

in terms of new objects, yes, it is assumed they are caught next time a 
harvest run is conducted.

in general, as long as the data provider follows the specification of 
the protocol and the implementation guidelines, the harvester should be 
able to straighten out any apparent inconsistencies it detects.

hope this is useful ...

ttfn,
----hussein


Jeff Pearson wrote:

> Michael,
> 
> Thanks for the reply. Question then;
> 
> Sample case:
> 
> Harvester issues a query. DP sends back 100 out of 10,000 results. 
> Harvester then begins to request the consecutive chunks. Given that the 
> total data set is 10,000, this will probably take a while. Before the 
> entire result set is transfered, the DP updates it's repository which 
> shuffle the order in which the results are returned. Objects that were 
> transferred previously are now kicked back to a later position so it is 
> included in a chunk later requested by the harvester.
> 
> Does the DP now invalidate the resumptionToken or does it assume the 
> Harvester will de-dupe objects on it's side?
> 
> What about the new objects that have been added and are in chunks of the 
> resultset already transferred? Is it assumed that they will be caught 
> the next time around given that the modifydate SHOULD be later than the 
> last harvest date? Or is it the harvester's responsibility to straighten 
> this all out?
> 
> 
> Jeff Pearson
> University of Southern California
> 
> On Sep 28, 2004, at 9:54 AM, Michael Nelson wrote:
> 
>> On Tue, 28 Sep 2004, Jeff Pearson wrote:
>>
>>> I guess I misstated my query last time. I understand the
>>> implementations as defined in the spec; either create a data result
>>> cache and hit that or regenerate the query each time. What I was
>>> wondering was which people were choosing to implement and why.
>>
>>
>> people are mostly doing the latter approach; only folks w/ systems that
>> natively support result sets are doing former approach.
>>
>> regards,
>>
>> Michael
>>
>>>
>>>
>>> _______________________________________________
>>> OAI-implementers mailing list
>>> List information, archives, preferences and to unsubscribe:
>>> http://openarchives.org/mailman/listinfo/oai-implementers
>>>
>>
>> ----
>> Michael L. Nelson mln at cs.odu.edu http://www.cs.odu.edu/~mln/
>> Dept of Computer Science, Old Dominion University, Norfolk VA 23529
>> +1 757 683 6393 +1 757 683 4900 (f)
> 
> 
> 
> _______________________________________________
> OAI-implementers mailing list
> List information, archives, preferences and to unsubscribe:
> http://openarchives.org/mailman/listinfo/oai-implementers
> 

-- 
=====================================================================
hussein suleman ~ hussein at cs.uct.ac.za ~ http://www.husseinsspace.com
=====================================================================




More information about the OAI-implementers mailing list