[OAI-implementers] (no subject)

Fri Dec 9 07:01:06 EST 2011

Am 09.12.2011 11:28, schrieb José Borbinha:
> Hi Thomas
> I’m afraid the issue you are raising is out of the scope of OAI, as IMHO it
> is part of the “upper level” of the business.
>  
> In fact, in a properly designed data/entity architecture, records should
> have two identifiers, one absolute, purely logical, and persistent (like for
> example an ISBN number, DOI, National Bibliographic Number, etc., with no
> need to be sequential…), and eventually other “technical” (as the index in
> the database, for example, which tend to be sequential, and this OAI takes
> advantage of that). Both can even have be represented by elements in the
> data record… 

Well, the actual metadata transported by OAI-MPH may have concepts of
mergend or moved records, but we can easily state that oai_dc doesn't
and OAI-PMH itself (those things transported in the header) has (optional)
provisions for deleted records, but not for merging or moving.

I'm quite certain that there is no universally agreed semantics for
merging or moving of records let alone the impact on their corresponding
identifiers or analogous operations on the repository level (relabeling,
moving or merging whole repositories ... with no impact on the actual
metadata). One could define an extension format to communicate changes on
the identifier level, but its use would be severeley restricted to the
originating repository and its underlying data model and operationals
concepts...

Thus, moved identifiers are technically a case of deleting the old
record and creating a new one (and merged records alike: deliberate
duplication on the OAI level does sound more a headache than a
viable option).

If the repository does not support deleted records, you would simply have
to relabel your identifiers and shift the Earliest Datestamp, forcing all
harvesters to reload everything (and for identifying the deletions
they have always been on their own). But if you advertise deleted
records, then you'll forever have to document and deliver any old
identifiers as deleted ones.

Another question is, whether relabeling of identifiers (or moving whole
repositories) can be realized on the HTTP level: Would harvesters
break if a request for identifier A in repository X returns a redirect
to a response for identifier B (in repository Y)? They probably would...
And with respect to incremental harvesting and ListIdentifiers or
ListRecords requests one HTTP status code cannot communicate that
each one of a bunch of identifiers returned would be the redirection
target of a hypothetical GetRecord request for an old identifier
we don't name...

I'd recommend to set up a completely new repository under a different
address, constituting the most drastical break and inform off-band
(mail, website, ...) all those who care about the transformations
necessary to old identifiers in order to turn the transition into
a seamless one.

viele Gruesse
Thomas Berger