[OAI-implementers] XSD file for qualified DC

Carl Lagoze lagoze@cs.cornell.edu
Fri, 21 Jun 2002 07:01:23 -0400


Summary for the casual reader:

1. Detailed Citation data about a resource and data about its references
are important imformation to make available
2. This information does not fit into the purposely simplistic (and very
useful) DC data model (qualified or unqualified)
3. Since OAI-PMH accommodates multiple metadata formats, why try to cram
the information into DC and why not make it available in a more
specialized format.

Now to respond to Tim and for the more interested reader....


Tim, 

I'm trying to parse out where we dissagree :-(

>From my point of view this is one of those "how much to we shove into
DC" questions that has been kicked around for quite a while.  Somehow
these keep getting conflated with the qualifier argument, which has a
pretty legitimate model now associated with it.

For qualifiers, the model is more or less an "is a" relationship for
elements and values.  For example, "created" makes a proper qualifier
for "date" since "date created is a date".  Similarly, "LCHS" makes a
proper qualifier for "subject" since a "LCSH subject is a subject".  

The model starts to crumble at the edges when one tries to introduce
"has a" relationships within the DC element structure.  The DC community
has, for example, continually struggle over "agent" descriptors to say
things like "the creator is Carl Lagoze and he is affiliated with
Cornell University", which effectively introduces a new intermediate
entity, which in pseudo E-R language is "R has a creator CL that has an
affiliation Cornell".  

This is exactly what is happening when one tries to say things like:

<meta name="DC.Identifier.citation" 
content="Library and Information Science Research 22(3), 311-338
(2000)">

In that the entity Library and Information Science Research is being
introduced that "has a" volume "22" and issue # "3". 

This is bad enough: but now if one also wants to express sentences like
"A references B, which has the bibliographic data "Journal of silly
results, June 2000, 40(3)", we are going way down the entity chain.  

This is NOT Dublin Core, which has and should have a very limited data
model.  It IS important metadata about an item and we should have a way
of expressing it.  Fine, let's come up with some format and make it
available through OAI-PMH.


Carl

> -----Original Message-----
> From: Tim Brody [mailto:tim@tim.brody.btinternet.co.uk] 
> Sent: Thursday, June 20, 2002 12:06 PM
> To: Carl Lagoze; Ann Apps
> Cc: herbert van de sompel; oai-implementers@oaisrv.nsdl.cornell.edu
> Subject: Re: [OAI-implementers] XSD file for qualified DC
> 
> 
> Hi Carl et al,
> 
> The short answer is I disagree with you Carl, but please read on ...
> 
> [Parallel meta formats]
> 
> I would argue it is marginally beneficial to have both the relations
> ("reference data") and metadata ("citation data", or 
> "context") in the same
> record. In the same record it is the XML that defines the relation, in
> parallel metadata formats it is the format (breaking the 
> model of records
> being self-contained).
> 
> Regardless, there will always need to be at least two 
> metadata formats:
> unqualified DC, and some "relatedto" aware format.
> 
> [Abstract view]
> 
> I'm interested in making links from A to B. To achieve that needs a
> comparable thing contained in A that is also contained in B. 
> The "thing" is
> either a URI, a string, or structured data:
> 
> <a>
> <relatedto>
>     <identifier/>
>     <unstructured_metadata_string/>
>     <structured_metadata/>
> </relatedto>
> </a>
> 
> <b>
> <identifier/>
> <unstructured_metadata_string/>
> <structured_metadata/>
> </b>
> 
> Instantiating this in DC:
> 
> "dc:identifier is an unambiguous reference to a resource" - 
> citations are
> intended to be unambiguous references.
> "dc:relation is a reference to a related resource" - I 
> believe reference
> data falls under this description.
> 
> So unqualified DC can contain all the data that I'm 
> interested in, and can
> be heuristically parsed later on (just as most OAI services 
> do now with
> dc.creator and dc.date, and I do now with dc:identifier).
> 
> The reason for exposing more structure than strings in OAI is so:
> 
> 1) Structured source documents don't lose that structure (e.g. BMC XML
> articles) and,
> 2) That Gateway services can provide structured data to 
> downstream services
> (e.g. Citebase, CERN)
> 
> As I understand DC qualifiers, they give greater structure 
> without breaking
> the principal of lowest-common denominator (i.e. remove all the
> qualifications and its still understandable and actionable by 
> a system that
> talks DC):
> 
> <a_dc>
> <dc:relation>Smith, John (1992) Functional Principles in Functions v5
> 44-50</dc:relation>
> </a_dc>
> 
> Passed through a citation-aware gateway would produce:
> 
> <a_dcq>
> <dcq:relation.references>
> <dcterms:citation>Smith, John (1992) "Functional Principles 
> in Functions" v5
> 44-50</dcterms:citation>
> <dcq:identifier.citation>
> <dc:creator>Smith, John</dc:creator>
> <dc:date>1992</dc:date>
> <dc:title>Functional Principles in Functions</dc:title>
> <ja:volume>5</ja:volume>
> </dcq:identifier.citation>
> </dcq:relation.references>
> </a_dcq>
> 
> And could be linked against B using {creator,title,date}:
> 
> <b>
> <dc:identifier>doi:24-23123/xxxyyy</dc:identifier>
> <dc:identifier>Smith, John, Smith, Joan (1992) Functional 
> Principles in
> Functions, J-Principles volume 5 45-50</dc:identifier>
> <dc:creator>Smith, John</dc:creator>
> <dc:creator>Smith, Joan</dc:creator>
> <dc:title>Functional Principles in Functions</dc:title>
> <dc:date>1992</dc:date>
> </b>
> 
> I've no doubt the same can be achieved through OpenURL, but DC is more
> widely used (so the closer to DC, and the more use of DC elements, the
> greater the adoption will be, hence the greater the interoperability).
> 
> I don't like marc XML because IMHO "<id type='xxx'>", where 
> xxx is a code,
> is not useful beyond library systems (i.e. marc XML isn't 
> comprehensible to
> the human-reader, at which point you may as well use more 
> efficient binary
> mark-up).
> 
> All the best,
> Tim.
> 
> ----- Original Message -----
> From: "Carl Lagoze" <lagoze@cs.cornell.edu>
> To: "Ann Apps" <ann.apps@man.ac.uk>
> Cc: "herbert van de sompel" <herbertv@lanl.gov>;
> <oai-implementers@oaisrv.nsdl.cornell.edu>
> Sent: Thursday, June 20, 2002 12:58 PM
> Subject: RE: [OAI-implementers] XSD file for qualified DC
> 
> 
> > Ann,
> >
> > Thanks for the clarifications here.  Yes, I understand the 
> overloading
> > of the term "citation".  My colleague Donna Bergmark here 
> at Cornell in:
> >
> > Bergmark, D. and Lagoze, C., "An Architecture for Automatic 
> Reference
> > Linking," presented at 5th European Conference on Research 
> and Advanced
> > Technology for Digital Libraries, Darmstadt, Germany, 2001,
> >
> > Was much more systematic in calling the in-links in the link graph
> > "citations" and the out links "references"; in that sense 
> we should then
> > really talk about "citation data" as your category one below (the
> > bibliographic information for the resource itself) and 
> "reference data"
> > as your category two below (bibliographic information for 
> the resources
> > referenced by the resource).
> >
> > Using this terminology I think we all agree that putting 
> reference data
> > into Dublin Core is not right.  This is very much a "one-to-one"
> > violation in that it would involve putting metadata about another
> > resource into the metadata container of some source resource.  Thus,
> > there is a clear application of some parallel metadata form 
> to expose
> > the reference data; probably following the openURL, 
> bison-fute concepts
> > that Herbert has outlines.
> >
> > Turning attention to the citation data issue, I will argue equally
> > strongly that slotting these into the dc identifier element is
> > inappropriate.  Citation data is implicitely structured whereas dc
> > elements should be simply "appropriate literals" as defined by Tom
> > Baker.  Playing a syntactic trick and serializing that data into an
> > "appropriate literal" through the use of punctuation such 
> as "Library
> > and Information Science Research 22(3), 311-338 (2000)" as 
> you suggest
> > in http://epub.mimas.ac.uk/DC/citproposal.html, seems 
> ill-advised with
> > data that screams out for markup such as:
> >
> > <citation>
> > <journalTitle>Library and Information Science
> > Research</journalTitle>
> > <journalVolume>22</journalVolume>
> > ....
> > </citation>
> >
> > Since this explicit structure is not currently allowed in DC (and I
> > question whether it ever should be) and given the fact that 
> OAI-PMH is
> > quite happy expressing parallel structured form, it might be time to
> > write the schema for such citation data and encourage 
> people to expose
> > it for harvesting, and not characterize it as "dublin core".
> >
> > Carl
> >
> > >
> > > On Wed, 19 Jun 2002, Ann Apps wrote:
> > >
> > > > Herbert,
> > > >
> > > >
> > > > I agree entirely with your suggestion about using OpenURL as a
> > > > parallel metadata format.
> > > >
> > > >
> > > > However, as the question which started this was about
> > > qualified DC, I
> > > > would like to point out that there may be some 
> confusion about the
> > > > meaning of 'citation', especially about the DC-Citation
> > > stuff, which
> > > > has also been referred to as connected with OpenURL by 
> the Ariadne
> > > > paper (http://www.ariadne.ac.uk/issue27/metadata/). A
> > > confusion which
> > > > probably wasn't helped by my earlier email.
> > > >
> > > >
> > > > The term 'citation' is used to describe 2 similar but different
> > > > things. It is easiest to desribe this for journal articles.
> > > >
> > > >
> > > > 1. The bibliographic citation information (journal, issue,
> > > pagination)
> > > > for an article as part of the metadata for the article
> > > itself. This is
> > > > what publishers refer to as the header information for 
> the article.
> > > >
> > > >
> > > > 2. The citation information for papers cited by an article
> > > which are
> > > > listed in the references section of the article.
> > > >
> > > >
> > > > The DC-Citation work is, so far, about (1). Maybe the 
> choice of the
> > > > term 'citation' was unfortunate, because everyone 
> assumes it means
> > > > (2), but it's difficult to think of a better word. This 
> is why the
> > > > encoding suggested for dc-citation is within a
> > > dc:identifier element,
> > > > because of the recognition that the bibliographic citation can
> > > > effectively identify the article. [This could obviously be
> > > > extrapolated to (2) but would be within a
> > > > dc:relation/dcterms:references element.]
> > > >
> > > >
> > > > The scenario you describe is for citation (2). Here the parallel
> > > > metadata format within a context object you describe 
> looks perfect.
> > > > This is obviously a major OAI requirement, for 
> initiatives such as
> > > > Citebase.
> > > >
> > > >
> > > > But I think that citation (1) will also be needed as OAI is
> > > used for
> > > > more than just eprints repositories. For instance, if 
> you wanted to
> > > > provide OAI records from an A+I database, or a journal
> > > article table
> > > > of contents database, you would need to be able to detail the
> > > > journal/issue information within each record. I could see
> > > this being
> > > > of use for harvesting records for the latest journal issues
> > > available
> > > > in such a service. I think you can still use the OpenURL
> > > metadata for
> > > > this but that it would be 'nested' within the DC 
> record, similar to
> > > > the noddy example I previously wrote. At the moment we're
> > > still stuck
> > > > with using unrecognised DC structured values in literal
> > > strings within
> > > > simple DC to pass this information around.
> > > >
> > > >
> > > > But at present, I think that the OAI priority is 
> citations(2), and
> > > > this current development looks really promising. 
> Citations(1) will
> > > > need more discussion within DC.
> > > >
> > > >
> > > > Best wishes,
> > > >
> > > > Ann
> > > >
> > > >
> > > >
> > > > On Tue, 18 Jun 2002 herbert van de sompel wrote:
> > > >
> > > >
> > > > <color><param>7F00,0000,0000</param>> 1. In the context of the
> > > > OAI-PMH, it would make a lot of sense to
> > > >
> > > > > treat citations as a parallel metadata format.  The 
> unqualified DC
> > > >
> > > > > record describes the "paper", whereas another record
> > > (under the same
> > > >
> > > > > item) describes all the citations made in the "paper".
> > > That is what
> > > >
> > > > > Carl suggested in his mail.  And that is the approach 
> that Stevan
> > > >
> > > > > Harnad and I discussed at last year's OAI-related 
> conference in
> > > >
> > > > > Geneva.  This approach makes sense in that it is 
> extensible: it
> > > > > allows
> > > >
> > > > > other stuff related to the "paper" (for instance usage logs,
> > > >
> > > > > certification metadata, preservation metadata, etc.) to
> > > be treated
> > > > > in
> > > >
> > > > > yet other parallel records under the same item.
> > > >
> > > > >
> > > >
> > > > > 2. When it comes to choosing a "metadata format" to 
> describe those
> > > >
> > > > > citations, looking at OpenURL makes a lot of sense.  Not only
> > > > > because
> > > >
> > > > > it is becoming a standard, but because its purpose 
> really IS to
> > > >
> > > > > describe stuff (read "citations" in this context) by 
> building on a
> > > >
> > > > > broad range of identifier-namespaces and a multitude 
> of metadata
> > > >
> > > > > formats.  Moreover, OpenURL allows not only for the
> > > description of a
> > > >
> > > > > "citation" but (optionally) also of entities that make up the
> > > > > context
> > > >
> > > > > in which the "citation" appears.  That is very 
> significant when
> > > >
> > > > > thinking about the possibility of open linking at the 
> level of OAI
> > > >
> > > > > service providers. And it is significant when 
> thinking of using
> > > >
> > > > > "OpenURL" as a parallel metadata format, as it allows the
> > > citation
> > > > > to
> > > >
> > > > > remain attached to the thing in which it is cited.
> > > >
> > > > >
> > > >
> > > > </color>[...]
> > > >
> > > >
> > > > <nofill>
> > > >
> > > 
> ----------------------------------------------------------------------
> > > > ----
> > > > Mrs. Ann Apps. Senior Analyst - Research & Development, MIMAS,
> > > >      University of Manchester, Oxford Road, Manchester, 
> M13 9PL, UK
> > > > Tel: +44 (0) 161 275 6039    Fax: +44 (0) 0161 275 6040
> > > > Email: ann.apps@man.ac.uk  WWW: http://epub.mimas.ac.uk/ann.html
> > > >
> > > --------------------------------------------------------------
> > > ------------
> > > > _______________________________________________
> > > > OAI-implementers mailing list
> > > > OAI-implementers@oaisrv.nsdl.cornell.edu
> > > > http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers
> > > >
> > >
> > >
> > > _______________________________________________
> > > OAI-implementers mailing list 
> OAI-implementers@oaisrv.nsdl.cornell.edu
> > > http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers
> > >
> > _______________________________________________
> > OAI-implementers mailing list
> > OAI-implementers@oaisrv.nsdl.cornell.edu
> > http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers
> 
>