[OAI-implementers] Using dates other than metadata record creation date for data provider "from" and "until" searches

Rob Tice rob.tice at k-int.com
Fri Apr 11 12:20:30 EDT 2008


Hi Kat

I am not saying that harvesting software shouldn't be robust or be able to
recover from errors (where possible). However, I still reckon that
performing some (pretty rudimentary) data analysis and adhering to a few
basic principles when exposing your data, can make harvesting easier. 

In my experience as a harvester (when dealing with large collections), I
would much rather set up an instruction to sequentially issue (for example)
50 date bounded requests (starting with the earliest date stamp from the
identify response). 50 x 4000 is much better than 1 x 200,000 IMHO :)

Cheers

Rob



> -----Original Message-----
> From: Kat Hagedorn [mailto:khage at umich.edu]
> Sent: 11 April 2008 15:01
> To: Rob Tice; lisa at issuelab.com
> Cc: oai-implementers at openarchives.org
> Subject: Re: [OAI-implementers] Using dates other than metadata record
> creation date for data provider "from" and "until" searches
> 
> I¹m not sure I agree. Any robust harvester software should be able to
> *initially* harvest regardless of datestamp.
> 
> Problems such as network connectivity, etc. should be addressed in the
> harvesting software to allow as close to seamless harvesting as
> possible.
> For example, software should have a timeout feature that provides for
> network issues (e.g., waits 1 minute before trying again). If robust
> software is not able to handle these problems, it¹s most likely an
> issue on
> the data provider side, in my experience.
> 
> Regards,
>  -Kat
> 
> 
> On 4/11/08 5:53 AM, "Rob Tice" <rob.tice at k-int.com> wrote:
> 
> > Hi Lisa
> >
> > I find that it is always helpful to put yourself in the shoes of
> someone who
> > wants to harvest your data.
> >
> > If the smallest record count that can be obtained from your system as
> the
> > result of an initial OAI ListRecords request (including dates and/or
> sets) is
> > very large, it can be quite difficult for harvesting systems to
> successfully
> > complete an Œinitial population¹ from your data without other
> influences
> > (network  connectivity, target response time, resumption token
> lifetime etc.)
> > having an increasing bearing on the successful outcome of the
> harvest.
> >
> > For example, having a repository containing 200,000 records, all
> dated the
> > same day and not supporting a request granularity of less than 1 day
> makes
> > initial population more difficult for any harvesting system J.
> >
> > I do not know how many records you have so this may not be an issue
> for you
> > but I think it is worth bearing in mind.
> >
> > Cheers
> >
> > Rob
> >
> >
> >
> >
> > From: oai-implementers-bounces at openarchives.org
> > [mailto:oai-implementers-bounces at openarchives.org] On Behalf Of
> Frederic
> > MERCEUR
> > Sent: 11 April 2008 07:41
> > To: lisa at issuelab.com
> > Cc: oai-implementers at openarchives.org
> > Subject: Re: [OAI-implementers] Using dates other than metadata
> record
> > creation date for data provider "from" and "until" searches
> >
> > Hello,
> > As far as I understand the OAI protocol, I would rather say that
> DateStamp is
> > about the last time that your record has been updated (which then
> must reflect
> > "create", "update" or "delete").
> > When you will first register your archive into a Harvester, I guess
> the
> > harvester will first get all records available. To do so, it will
> query your
> > archive without the "from" and "to" parameter.
> > Then, most of harvesters will run regularly some incremental
> harvesting to get
> > the records modified, deleted or added since the previous harvest. To
> do so
> > they will run the query with the "from" parameter.
> > Kind regards,
> > Fred
> >
> >
> > Lisa M. Brooks a écrit :
> > Hello - We're very close to launching our data provider. Before we do
> I have a
> > question about date-stamps.
> >
> > I understand that the "from" and "until" dates used to request
> metadata
> > records refer to the date that the metadata record was created. We
> are an
> > archive of research works that date back to the 1980s (we will
> definitely get
> > even older works into our archive as we move forward). To my mind it
> would be
> > more helpful to folks if our record date-stamps reflect the date the
> research
> > work in question was first published.
> >
> > My concern is that we introduce our repository and harvesters don't
> get the
> > gist of the temporal scope of our collection because everything is
> > date-stamped en masse with the date that we generate our metadata
> records
> > (which, with luck, will be this Saturday).
> >
> > I hope I'm making sense! Just want to know if this is a big no-no, or
> if there
> > are things to consider before doing something like this. Appreciate
> the
> > insight of list participants.
> >
> > Thanks for reading -
> > ~Lisa
> >
> > Lisa M. Brooks
> > IssueLab - bringing nonprofit research into focus
> > lisa at issuelab.org
> > 773-649-1790
> > http://www.issuelab.org
> >
> >
> >
> >
> >
> >
> >
> >
> > _______________________________________________
> > OAI-implementers mailing list
> > List information, archives, preferences and to unsubscribe:
> > http://www.openarchives.org/mailman/listinfo/oai-implementers
> >
> >
> >
> 
> 
> -------------------
> Kat Hagedorn
> OAIster/Metadata Harvesting Librarian
> DLXS Bibliographic Class Coordinator
> Digital Library Production Service
> University of Michigan
> 
> http://www.oaister.org/
> http://www.dlxs.org/
> email: khage at umich.edu
> phone: 734-615-7618
> 
> No virus found in this incoming message.
> Checked by AVG.
> Version: 7.5.519 / Virus Database: 269.22.12/1372 - Release Date:
> 10/04/2008 17:36
> 

No virus found in this outgoing message.
Checked by AVG. 
Version: 7.5.519 / Virus Database: 269.22.12/1372 - Release Date: 10/04/2008
17:36
 




More information about the OAI-implementers mailing list