[OAI-implementers] ListRecords request w/out an until..

Benjamin Anderson benanderson.us at gmail.com
Tue Feb 1 10:36:55 EST 2011


ugh, sorry - I pushed send too quickly.  The 2 bullet points after the
sentence I quoted clear it up (at least in regards to what the spec
defines).  Seems a little ambiguous, though, doesn't it?  It seems that to
harvest in the most correct way you almost have to know how the provider is
implemented, which kind of defeats the purpose of a spec.  I'm still curious
as to whether there's a de facto standard that most providers are using?


On Tue, Feb 1, 2011 at 10:26 AM, Benjamin Anderson <benanderson.us at gmail.com
> wrote:

> Thanks Simeon.  I'm looking over the section you linked to...
>
> Repositories that implement resumptionTokens *must* do so in a manner that
>> allows harvesters to resume a sequence of requests for incomplete lists by
>> re-issuing a list request with the most recent resumptionToken
>>
>
> I'm having a hard time understanding this sentence. What is meant by
> "incomplete list"?  What is meant by "re-issuing a list request"?
>
> I was just thinking that my harvester assumption wouldn't work for the
> given scenario:
>
> Let's assume a provider that allows for updates during harvests and that
> this provider only keeps the most recent updated date (not all update
> dates).  If a record was updated before t0 and again after t0 (but before it
> was included in the harvest initiated at t0), then the harvester will not
> get the record even though it should have.  That's probably a rare case, but
> nevertheless bound to happen.  Are there guidelines for the best way to use
> an until as a harvester?
>
> Thanks again,
> Ben
>
>
> On Tue, Feb 1, 2011 at 10:05 AM, Simeon Warner <simeon.warner at cornell.edu>wrote:
>
>> Hi Ben,
>>
>> This is covered in the in section 3.5.1 of the specification:
>>
>> http://www.openarchives.org/OAI/2.0/openarchivesprotocol.htm#Idempotency
>>
>> I think your solution for the harvester is the correct one. Provided the
>> harvester starts again with from=t0 all changes between t0 and t2 will be
>> harvested, irrespective of whether or not they were included in the original
>> response (modulo understood problems with items that move between sets for
>> set selective requests).
>>
>> Cheers,
>> Simeon
>>
>>
>> On 02/01/2011 09:09 AM, Benjamin Anderson wrote:
>>
>>> Hi,
>>>
>>> I'm wondering what others are doing when a ListRecords request w/out an
>>> until comes in.� Consider this scenario:
>>>
>>> t0 - harvest request (with no until) is initiated
>>> t1 - record 101 is added to the repo
>>> t2 - harvest is finished (it took multiple requests to complete)
>>>
>>> Should record 101 be included in the harvest data?� If not, will the
>>> client better issue their next harvest with a from=t0 (a from=t2 would
>>> be invalid because they'd miss out on record 101).
>>>
>>> We have implemented both oai-pmh harvesters and providers, so I have to
>>> consider both ends of this.� Here's what I'm thinking...
>>>
>>> As a Provider
>>> I will simply lock the repo so that the above scenario can't happen.� If
>>> someone is already harvesting (there exist unexpired resumptionTokens)
>>> then I will not update the repository.
>>>
>>> As a Harvester
>>> I will always use the until parameter with the value of the time the
>>> harvest was initially started.
>>>
>>> I think this keeps me clear of any problems.� Anyone else have thoughts
>>> or care to share your solutions?
>>>
>>> Thanks,
>>> Ben Anderson
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> OAI-implementers mailing list
>>> List information, archives, preferences and to unsubscribe:
>>> http://www.openarchives.org/mailman/listinfo/oai-implementers
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.openarchives.org/pipermail/oai-implementers/attachments/20110201/8d3c4617/attachment-0001.htm


More information about the OAI-implementers mailing list