[UPS] Problems/Comments with Santa Fe Metadata Set

Carl Lagoze lagoze@cs.cornell.edu
Thu, 18 Nov 1999 08:31:26 -0500


Hi Herbert,

Thanks for your response.  I'm going to review the other response before
deciding what to do about the display URL dilemma.  I think we all agree
that it is problematic, but it may be the expedient and necessary thing to
do.  I spoke to Clifford Lynch a couple of days ago when we appeared at a
meeting together and he felt the same.

Regarding the rest of your letter, you've got the essence of the Dienst
capabilities correct.  We are scheduled to have the definitive protocol
document done by the first week of December.

Carl


 -----Original Message-----
From: 	herbert van de sompel [mailto:herbert.vandesompel@rug.ac.be] 
Sent:	Monday, November 15, 1999 2:58 PM
To:	Carl Lagoze
Cc:	fielding@CS.Cornell.EDU
Subject:	Re: [UPS] Problems/Comments with Santa Fe Metadata Set

Hi Carl & David,

1. ID problem
=============

While compiling a proposal text for the Santa Fe Convention, I actually
ran into the same problem.  More than this, at the meeting, I already
had an odd feeling about this: I didn't fully understand the whole
discussion about the identifiers, even given my background in the SFX
work that deals heavily with them.  

As far as I am concerned there are two crucial elements to solve this
problem, without putting a heavy burden on the archives:
- making the "optional" Object ID mandatory
- require the archive to implement and document a link-to-syntax that
uses the Object ID and the knowledge about various instances of objects
as parameters (Marc Doyle calls this a wrapper; Hellman has the SLinkS
syntax to describe such link-to-syntaxes)
(and getting rid of the display ID)
Given both elements, a URL can be constructed that points at the desired
instance of a document.

In the most simple example, the identifier (handle) that can be used to
extract object-metadata, can also be used to get to object full-text,
and the link-to-syntax becomes extremely simple (cf xxx).  

I think it would go too far to require all archives to implement the
same Dienst-inspired link-to-syntax.  But I feel that the two
requirements mentioned above can guarantee the required result, while
allowing flexibility in local implementation.

2. Dienst specs
===============

Can you give me an indication on when the specs would become available
that describe the Santa Fe subset of Dienst in detail?  Actually, in the
Convention I refer to these as the openArchive subset of Dienst.  I
include the short description that I have included in the Convention to
explain what the Dienst subset is about.  The idea is to link from this
short description to the full descritpion of required subset of Dienst:

...

Detailed technical specifications for the openArchive Dienst subset are
available at http://www.openarchives.org/sfc/dienst.  A very brief
description is presented here, to support a better understanding of the
Convention.  

The Dienst protocol has an http based implementation.  Its openArchives
subset uniquely defines the 3-step procedure, as well as the
corresponding syntax, that must be used to selectively harvest metadata
from archives that are Santa Fe compliant:

· Step 1: the openArchive Dienst subset defines the way in which an
archive can be polled to obtain the following archive-specific
information:
· The logical partitions implemented in the archive;
· The metadata formats that are supported for delivery of archive
metadata in response to a harvesting request.
The Dienst subset also defines the syntax used by the archive to respond
to such requests.  It does not define the set of valid responses to the
metadata format request; those are presented under the heading "Metadata
formats".

· Step 2: the openArchive Dienst subset defines how to requests a list
of handles pointing at archive metadata.  A syntax is available to
request:
· A list of handles for the complete archive;
· A list of handles for a partition of the archive;
· A list of handles for documents that have become available in the
archive after a specified date;
· A list of handles for documents that have become available in an
archive partition after a specified date.
The Dienst subset also defines the way in which the list of handles will
be returned.

· Step 3: the openArchive Dienst subset defines how to harvest the
metadata corresponding with the list of handles obtained in Step 2 and
how to specify one of the supported metadata formats (see Step 1) in
which the metadata should be returned.  It does not define the
transportation protocols for the metadata itself; those are presented
under the heading "Metadata formats".

...

Many greetings

herbert

Carl Lagoze wrote:
> 
> In the course of working on our latest release of the Dienst software, we
> did a little thinking about the metadata set we developed at the Santa Fe
> workshop (available at
> http://www.cs.cornell.edu/lagoze/External/UPS/SFMeta.htm
> <http://www.cs.cornell.edu/lagoze/External/UPS/SFMeta.htm> ).  Both David
> Fielding, the researcher in charge of Dienst development, and I noted a
flaw
> in the "Display ID" element.  Recall that there are two Ids:
> 
> 1.      A mandatory Display ID which is a URL to a human readable page
> 2.      An optional Object ID which is a locally scoped URN.
> 
> Our view throughout the design of Dienst (and digital object repositories
in
> general) is that a repository is not in the business of human
presentation.
> It simply provides sufficient information through a protocol so that other
> services can use its contents.  From the perspective human interaction, it
> provides protocol requests that can be used by any user interface to
> construct "display pages" are pages that access specific disseminations or
> parts of disseminations.  Thus,  there may be many user interfaces and
many
> "display Ids" for a particular digital object. Furthermore, a repository
> does not have any record of what these display Ids are (i.e., does the
> publisher of a book know every house, library, bookstore that their book
> sits in).
> 
> The display ID metadata element presumes that not only does the repository
> or digital object know about these URLs but endows one with the property
of
> being the "correct" one (a rather wrong concept since the display ID for
an
> Italian audience should be different than for a US audience).  Furthermore
> it imprints it as part of the metadata for the digital object, which
> philosophically is a rather persistent entity - yes, objects should be
> persistent but the user interfaces that present them should be malleable.
> 
> For a little idea of how this works in the Dienst software take a look at
> the following example:
> 
> A document with the URN ncstrl.cornell/TR94-1418
> 
> Its display page from the Cornell ncstrl user interface is:
>
http://cs-tr.cs.cornell.edu:80/Dienst/UI/1.0/Display/ncstrl.cornell/TR94-141
> 8
>
<http://cs-tr.cs.cornell.edu:80/Dienst/UI/1.0/Display/ncstrl.cornell/TR94-14
> 18>
> 
> This information is put together from three protocol requests to the
object
> in the cornell repository:
> 
> 1. One which dumps the formats that the digital object is available in:
>
http://cs-tr.cs.cornell.edu/Dienst/Repository/2.0/Formats/ncstrl.cornell/TR9
> 4-1418
>
<http://cs-tr.cs.cornell.edu/Dienst/Repository/2.0/Formats/ncstrl.cornell/TR
> 94-1418>
> 
> 2. One which dumps the bibliographic info for the digital object:
>
http://cs-tr.cs.cornell.edu/Dienst/Repository/2.0/Body/ncstrl.cornell/TR94-1
> 418/bib
>
<http://cs-tr.cs.cornell.edu/Dienst/Repository/2.0/Body/ncstrl.cornell/TR94-
> 1418/bib>
> 
> 3. One which dumps the terms of access statement for the document:
>
http://cs-tr.cs.cornell.edu/Dienst/Repository/1.0/Terms/ncstrl.cornell/TR94-
> 1418
>
<http://cs-tr.cs.cornell.edu/Dienst/Repository/1.0/Terms/ncstrl.cornell/TR94
> -1418>
> 
> Another "display ID" for this same digital object is available at:
>
http://cs-tr.cs.cornell.edu/Dienst/UI/2.0/Describe/ncstrl.cornell/TR94-1418
>
<http://cs-tr.cs.cornell.edu/Dienst/UI/2.0/Describe/ncstrl.cornell/TR94-1418
> >
> 
> This uses the same raw repository requests to construct its information.
> 
> In fact, this is exactly the way that NCSTRL and XXX/CoRR interact.  Take
a
> look at the URL:
> http://cs-tr.cs.cornell.edu:80/Dienst/UI/1.0/Display/xxx.cs.DL/9812020
> <http://cs-tr.cs.cornell.edu:80/Dienst/UI/1.0/Display/xxx.cs.DL/9812020>
> 
> and you will see a document in XXX presented through the NCSTRL user
> interface.  You could go to http://xxx.lanl.gov/archive/cs/intro.html
> <http://xxx.lanl.gov/archive/cs/intro.html>  and get the same document
> through the XXX User interface.
> 
> Sorry to assault you with all this detail but we at Cornell have been
> somewhat in the business of trying to get DL protocols correct and this
> "display URL" violates some of our thinking on separation of concerns.  I
> don't have a real good answer here, since the "correct" answer (from the
> Dienst perspective) involves some more burden on the external services
> (understanding more protocol requests).
> 
> Carl
> 
> ------------------------------------------------------
> UPS mail list
> Mail submissions to ups@vole.lanl.gov
> To subscribe or unsubscribe visit
http://vole.lanl.gov/mailman/listinfo/ups