[OAI-implementers] How to verify a download worked?

Simeon Warner simeon@cs.cornell.edu
Mon, 4 Mar 2002 10:41:58 -0500 (EST)


The reason you get just 60k records from arXiv is probably linked with the
problem of specifying a date too early for my implementation to understand
correctly (now fixed, someone else pointed it out last week too). I don't
know about ways to verify successful harvesting but I would suggest that
doing a harvest with no 'from' and 'until' parameters is more robust than
picking an arbitrary 'from' date.


On Mon, 4 Mar 2002, Alan Kent wrote:
> Hi All,
> I was wondering if anyone has good schemes for verifying if a download
> of metadata 'worked'. For example, I crawled the arXiv site and got
> around 60,000 records. However, it turns out the site actually has
> 190,000 or so records. So I only got 1/3 of the site!
> Has anyone used any clever tricks to verify how well a crawl worked?
> I now have to work out if my crawler has been discarding one in three
> records! :-(
> Alan