[OAI-implementers] harvester guidelines

Samuele Kaplun Samuele.Kaplun at cern.ch
Thu May 26 11:40:53 EDT 2011

Hi Jasper,

Il giorno gio, 26/05/2011 alle 17.26 +0200, Jasper Op de Coul ha
> On 05/26/2011 02:05 PM, Samuele Kaplun wrote:
> > Il giorno gio, 26/05/2011 alle 12.43 +0200, Jasper Op de Coul ha
> > scritto:
> >> 3. Use incremental harvests, but never use the ?set param. The client
> >> will receive all records and can inspect the SetSpec header manually to
> >> see if this record is part of the wanted set. Records that are not part
> >> of the wanted set but are in the client database can be removed.
> >
> > this sounds like a nice idea, but it would not fully address the case
> > when, in the repository, the union of all sets, is just a subset of the
> > whole record universe. If a record gets out of a set and don't get into
> > any other set, then it will not be deleted, but it won't as well be
> > exported, in the case where the set param is not specified. So
> > unfortunately even with your solution this situation would not be
> > solved :-(
> I'm not sure if I follow you correctly. Do you mean that records
> wouthout any setspec never show up in the feed? I don't think this is
> the case. Maybe you mean that if only the setspec changes but not the
> metadata, then it could be that the datestamp is not updated?

In my scenario, I was assuming that not all records in a repository are
actually exported via OAI-PMH (e.g. this is the case for Invenio
instances such as the CERN Document Server). In our implementation, only
records that belongs to at least a set are actually exported. So if a
record was at some point in the past available in a set, it would have
been exported as well when the ?set= argument was missing. On the other
hand, if the record is removed from the set, and is in general no longer
exported (but is still in the repository), then there will be no
advertisement about this event (in incremental harvesting).

In general this problem simply show up when a record is no longer
exported, but nevertheless it is still available in the repository (and
hence can't really be considered as formally "deleted").

> Yes, but you can keep doing incremental harvests instead of throwing
> everything away and doing a full reharvest every month. So it is not
> that clear which scenario consumes the most bandwith.

Indeed :-)

>Ah that sounds very interesting indeed. I wont be attending OAI7 this
>time since I opted for the EuroPython conference in Florence, which is
>in the same week.. I'll suggest the talk to my colleague who is going.

Ooh... Also this one looks very interesting! 


Samuele Kaplun
Invenio Developer ** <http://invenio-software.org/>

More information about the OAI-implementers mailing list