[UPS] Problems/Comments with Santa Fe Metadata Set

herbert van de sompel herbert.vandesompel@rug.ac.be
Thu, 18 Nov 1999 09:07:17 +0100


Hi all,

I would like to add some thoughts to the important discussion on the
Display URL & object ID raised by Carl and replied to by Mark & Thomas. 
I had some e-mail exchanges about this with Mark too.  Based on the
knowledge we have now about the archives that were part of the proto, I
feel like suggesting the following enhancement to Santa Fe Set:

- Remove tags Display ID and object ID
- Add tag Location [R], which has 2 subtags (cf Author field):
* Link-to-URL : the URL that points to a human readable page that
provides access to the content refered to by the metadata.
* Object ID : a unique identifier for the content refered to by the
metadata, from which a Link-to-URL can be constructed in an
algorithmical way

I tend to feel that Location should be Mandatory (at least one of the
subtags needs to be filled out), but I think that somehow causes
problems for archives that only perform the basic funtions of submission
& long-term storage (no end-user funtions) AND have records without
full-text associated with the metadata.  For content with full-text in
such archives, the Location field could point at the full-text
directly.  But what if there is no full-text?  There is no way Location
can point at a metadata display page because the archive is not capable
of displaying it.  In that case - cf RePEc - there would be metadata
WITHOUT Location.  Still, we should take into account that there WILL be
overlay services for such basic archives too, because after all the
archives are there so that their content would be dissiminated.  Basic
archives live by virtue of overlay services.  This means that even if
the archive itself does not provide a display mechanism, (an) overlay
service(s) will.  That is the case for RePEc with - amongst others -
Ideas.  As such, in the RePEc world, the RePEc unique identifier could
be filled out in "object ID" and it could be used to construct a
Link-to-URL into an overlay service.  Making Location into a Mandatory
field after all? 

We need to make decissions on this matter.  We also still have an open
discussion about the Optional addition of a "Journal" or "Refereed"
tag.  Both the Dispaly thing and the Journal thing are very important
and deserve our attention.

many greetings

herbert



Mark Doyle wrote:
> 
> Greetings Carl,
> 
> > From: Carl Lagoze <lagoze@cs.cornell.edu>
> > Date: 1999-11-15 06:41:17 -0500
> 
> Sorry, I was unable to answer this sooner... Since I was the one who
> initiated the addition of this element, I feel I should address it. I
> understand your point of view, but I think that we live in an imperfect world
> and one needs to have pragmatic solutions to otherwise vexing problems. The
> whole point of these repositories and overlaid services is to make material
> available to researchers in a variety of formats, some of which may be much
> richer than others (the variation is both within a repository and across
> repositories). Formats may be added and removed as the underlying technology
> changes. Any service which chooses to just display a subset of a repositories
> formats (say, just PostScript or PDF) is likely to short change users. For
> instance, xxx offers many flavors of PostScript, some of which require the
> user to understand additional issues (e.g., font installation). So the simple
> goal (again in the context of doing things on the six month scale) is to
> give users a path to the definitive interface of a repository, preferably
> anchored around the target that the user is actually interested in at the
> moment. I feel it is much more useful to a user of the services to wind up a
> "wrapper" page than  just the home page of the arXiv. Furthermore, the URL's
> in the display ID  are to be persistent and freely accessible (some
> repositories may have to limit who can access certain components and a
> mechanism for authentication has to be made available).
> 
> > Our view throughout the design of Dienst (and digital object repositories in
> > general) is that a repository is not in the business of human presentation.
> > It simply provides sufficient information through a protocol so that other
> > services can use its contents.  From the perspective human interaction, it
> > provides protocol requests that can be used by any user interface to
> > construct "display pages" are pages that access specific disseminations or
> > parts of disseminations.  Thus,  there may be many user interfaces and many
> > "display Ids" for a particular digital object. Furthermore, a repository
> > does not have any record of what these display Ids are (i.e., does the
> > publisher of a book know every house, library, bookstore that their book
> > sits in).
> 
> This is all well and good in theory, but where the rubber hits the road, I
> think it fails. Not all repositories are the same. The selection of
> repository services that an overlay service makes visible to the reader is
> not likely to be the complete set of services. This is a disadvantage to the
> users who may not even be aware that they have other choices for retrieval of
> information.
> 
> > The display ID metadata element presumes that not only does the repository
> > or digital object know about these URLs but endows one with the property of
> > being the "correct" one (a rather wrong concept since the display ID for an
> > Italian audience should be different than for a US audience).
> 
> I strongly disagree. The fact is that most repositories have a definitive
> wrapper page that provides links to all available repository services
> relevant to a particular item in the repository (and a dynamic set of
> services at that). To use phrasing from physics, this is a "natural" URL -
> "naturalness" is not a statement about correctness (as you imply), but
> rather, it allows for a choice of a distinguished member of a class (here,
> the class of display URL's).  There may be other ways to make the choice
> (just as natural), but each arXiv has a very good sense of which URL is
> potentially the most useful to end users. The mere fact that these stable,
> persistent URLs exist and are made available by the repository distinguishes
> them from the rest of the URLs.
> 
> Should all overlays be required to track all of the services and mirrors of
> its underlying repositories? I think that is what your point of view requires
> (and from below, you seem to acknowledge this). You seem to want to keep
> users confined to a specific box without even giving them a chance to see
> that there may be more in the world than the box you give them.
> 
> > Furthermore
> > it imprints it as part of the metadata for the digital object, which
> > philosophically is a rather persistent entity - yes, objects should be
> > persistent but the user interfaces that present them should be malleable.
> 
> The URLs were meant to be as persitent as the object itself. The
> malleability is in what the URL points to, not what the URL is. URLs are not
> inherently non-persistent.
> 
> > For a little idea of how this works in the Dienst software take a look at
> > the following example:
> >
> > A document with the URN ncstrl.cornell/TR94-1418
> >
> > Its display page from the Cornell ncstrl user interface is:
> > http://cs-tr.cs.cornell.edu:80/Dienst/UI/1.0/Display/ncstrl.cornell/TR94-1418
> 
> > This information is put together from three protocol requests to the object
> > in the cornell repository:
> 
> [A rich set of wonderful examples deleted]
> 
> > This uses the same raw repository requests to construct its information.
> >
> > In fact, this is exactly the way that NCSTRL and XXX/CoRR interact.  Take a
> > look at the URL:
> > http://cs-tr.cs.cornell.edu:80/Dienst/UI/1.0/Display/xxx.cs.DL/9812020
> > <http://cs-tr.cs.cornell.edu:80/Dienst/UI/1.0/Display/xxx.cs.DL/9812020>
> >
> > and you will see a document in XXX presented through the NCSTRL user
> > interface.  You could go to http://xxx.lanl.gov/archive/cs/intro.html
> > and get the same document  through the XXX User interface.
> 
> This is the main counter example right here (thanks for providing it!). I do
> not object to your presentation of the information through the NCSTRL
> interface (having the uniform interface is quite nice), but I do not
> understand why you don't give the user the natural URL
> http://xxx.lanl.gov/abs/cs/9812020. Why force the user to navigate from
> http://xxx.lanl.gov/archive/cs/intro.html?  Actually, this example isn't
> really the best because your article is only available in a single format.
> Instead, I give http://xxx.lanl.gov/abs/cs.CL/9911006
> (http://cs-tr.cs.cornell.edu:80/Dienst/UI/1.0/Display/xxx.cs.CL/9911006)
> with source, pdf, and other formats (dvi and about 8 flavors of PS).   You
> suppress source and dvi, and you chose a single resolution of bitmapped fonts
> for the PS (I prefer resolution independent type I fonts). Where would
> NCSTRL give me an opportunity for discovering that I can choose a mirror,
> that I can choose a default download format (or even that other choices
> exist), that author names are conveniently linked for searching, or, in the
> case of some physics archives, that xxx provides "cited by" and "refers to"
> links?
> 
> > Sorry to assault you with all this detail but we at Cornell have been
> > somewhat in the business of trying to get DL protocols correct and this
> > "display URL" violates some of our thinking on separation of concerns.  I
> > don't have a real good answer here, since the "correct" answer (from the
> > Dienst perspective) involves some more burden on the external services
> > (understanding more protocol requests).
> 
> Exactly. My point is that there exist natural URLs which may give enhanced
> services to users. It may be that some repositories will just give the Dienst
> display URL and be done with it. But I submit that the majority of
> repositories will function not just as faceless warehouses, but will also
> present their own particular view of the world, will have a persistent URL
> mechanism for accessing that view, and some set of users will find benefit in
> the repository's view. I think you need to change your vocabulary a bit. Try
> "natural" or "canonical" rather than "correct."
> 
> All that said, I might be persuaded that the display ID doesn't have to be
> mandatory, but I think the act of a repository commiting to persistent
> nautral URLs  (i.e., the notion of making them readable as well as writable)
> is one of the foundational principles for making them function as true
> repositories. Thus, I don't think any repository should choose to omit it.
> Nor do I think any overlay should throw away this item of information if it
> is provided.
> 
> Cheers,
> Mark
> 
> ------------------------------------------------------
> UPS mail list
> Mail submissions to ups@vole.lanl.gov
> To subscribe or unsubscribe visit http://vole.lanl.gov/mailman/listinfo/ups