[OAI-implementers] How to verify a download worked?

Simeon Warner simeon@cs.cornell.edu
Mon, 4 Mar 2002 10:41:58 -0500 (EST)


Alan,

The reason you get just 60k records from arXiv is probably linked with the
problem of specifying a date too early for my implementation to understand
correctly (now fixed, someone else pointed it out last week too). I don't
know about ways to verify successful harvesting but I would suggest that
doing a harvest with no 'from' and 'until' parameters is more robust than
picking an arbitrary 'from' date.

Cheers,
Simeon.

On Mon, 4 Mar 2002, Alan Kent wrote:
> Hi All,
> 
> I was wondering if anyone has good schemes for verifying if a download
> of metadata 'worked'. For example, I crawled the arXiv site and got
> around 60,000 records. However, it turns out the site actually has
> 190,000 or so records. So I only got 1/3 of the site!
> 
> Has anyone used any clever tricks to verify how well a crawl worked?
> I now have to work out if my crawler has been discarding one in three
> records! :-(
> 
> Alan
>