[UPS] Problems/Comments with Santa Fe Metadata Set

Carl Lagoze lagoze@cs.cornell.edu
Mon, 15 Nov 1999 06:40:01 -0500


In the course of working on our latest release of the Dienst software, we
did a little thinking about the metadata set we developed at the Santa Fe
workshop (available at
http://www.cs.cornell.edu/lagoze/External/UPS/SFMeta.htm
<http://www.cs.cornell.edu/lagoze/External/UPS/SFMeta.htm> ).  Both David
Fielding, the researcher in charge of Dienst development, and I noted a flaw
in the "Display ID" element.  Recall that there are two Ids:

1.	A mandatory Display ID which is a URL to a human readable page
2.	An optional Object ID which is a locally scoped URN.

Our view throughout the design of Dienst (and digital object repositories in
general) is that a repository is not in the business of human presentation.
It simply provides sufficient information through a protocol so that other
services can use its contents.  From the perspective human interaction, it
provides protocol requests that can be used by any user interface to
construct "display pages" are pages that access specific disseminations or
parts of disseminations.  Thus,  there may be many user interfaces and many
"display Ids" for a particular digital object. Furthermore, a repository
does not have any record of what these display Ids are (i.e., does the
publisher of a book know every house, library, bookstore that their book
sits in).

The display ID metadata element presumes that not only does the repository
or digital object know about these URLs but endows one with the property of
being the "correct" one (a rather wrong concept since the display ID for an
Italian audience should be different than for a US audience).  Furthermore
it imprints it as part of the metadata for the digital object, which
philosophically is a rather persistent entity - yes, objects should be
persistent but the user interfaces that present them should be malleable. 

For a little idea of how this works in the Dienst software take a look at
the following example:

A document with the URN ncstrl.cornell/TR94-1418

Its display page from the Cornell ncstrl user interface is:
http://cs-tr.cs.cornell.edu:80/Dienst/UI/1.0/Display/ncstrl.cornell/TR94-141
8
<http://cs-tr.cs.cornell.edu:80/Dienst/UI/1.0/Display/ncstrl.cornell/TR94-14
18> 

This information is put together from three protocol requests to the object
in the cornell repository:

1. One which dumps the formats that the digital object is available in:
http://cs-tr.cs.cornell.edu/Dienst/Repository/2.0/Formats/ncstrl.cornell/TR9
4-1418
<http://cs-tr.cs.cornell.edu/Dienst/Repository/2.0/Formats/ncstrl.cornell/TR
94-1418> 

2. One which dumps the bibliographic info for the digital object:
http://cs-tr.cs.cornell.edu/Dienst/Repository/2.0/Body/ncstrl.cornell/TR94-1
418/bib
<http://cs-tr.cs.cornell.edu/Dienst/Repository/2.0/Body/ncstrl.cornell/TR94-
1418/bib> 

3. One which dumps the terms of access statement for the document:
http://cs-tr.cs.cornell.edu/Dienst/Repository/1.0/Terms/ncstrl.cornell/TR94-
1418
<http://cs-tr.cs.cornell.edu/Dienst/Repository/1.0/Terms/ncstrl.cornell/TR94
-1418> 

Another "display ID" for this same digital object is available at:
http://cs-tr.cs.cornell.edu/Dienst/UI/2.0/Describe/ncstrl.cornell/TR94-1418
<http://cs-tr.cs.cornell.edu/Dienst/UI/2.0/Describe/ncstrl.cornell/TR94-1418
> 

This uses the same raw repository requests to construct its information.

In fact, this is exactly the way that NCSTRL and XXX/CoRR interact.  Take a
look at the URL:
http://cs-tr.cs.cornell.edu:80/Dienst/UI/1.0/Display/xxx.cs.DL/9812020
<http://cs-tr.cs.cornell.edu:80/Dienst/UI/1.0/Display/xxx.cs.DL/9812020> 

and you will see a document in XXX presented through the NCSTRL user
interface.  You could go to http://xxx.lanl.gov/archive/cs/intro.html
<http://xxx.lanl.gov/archive/cs/intro.html>  and get the same document
through the XXX User interface.

Sorry to assault you with all this detail but we at Cornell have been
somewhat in the business of trying to get DL protocols correct and this
"display URL" violates some of our thinking on separation of concerns.  I
don't have a real good answer here, since the "correct" answer (from the
Dienst perspective) involves some more burden on the external services
(understanding more protocol requests).

Carl