[OAI-implementers] Harvesting- how to efficiently poll?

Simeon Warner simeon at cs.cornell.edu
Mon May 1 09:38:05 EDT 2006


I'm not sure I understand your question properly. However, I think it 
would be reasonable to assume that any repository that exposes only day 
granularity datestamps is not updated more frequently than daily. I'd poll 
at most once a day (for which you specify a 'from' parameter equal to the 
previous day -- one increment of overlap is necessary to ensure nothing is 
missed).

(As an aside, it is amazing to see how many RSS clients poll arXiv.org 
very frequently when we do include the standard headers saying that we 
update daily and give a time. One might have hoped that these headers 
would increase efficiency but that does not seem to be playing out in 
practice.)

-- 
Simeon


On Sat, 29 Apr 2006, steve racker wrote:
> If the granularity of an archive is YYYY-MM-DD and there are
> many records per day, how can one efficiently poll for the
> newest records?  I would have expected there to be a way to
> specify the last seen record and get any newer records, but
> it appears the only method is to first make a request with the
> date then keep requesting on any encountered resumptionTokens.
> when a response is received with no resumptionToken, keep
> it until it expires, then the next poll starts with the date
> again.  Is this correct? That seems to generate much repeated
> data in responses when polling with the last resumptionToken.
>
>
> ---------------------------------
> Yahoo! Messenger with Voice. Make PC-to-Phone Calls to the US (and 30+ countries) for 2¢/min or less.


More information about the OAI-implementers mailing list