[OAI-implementers] XSD file for qualified DC

Thu, 20 Jun 2002 17:06:10 +0100

Hi Carl et al,

The short answer is I disagree with you Carl, but please read on ...

[Parallel meta formats]

I would argue it is marginally beneficial to have both the relations
("reference data") and metadata ("citation data", or "context") in the same
record. In the same record it is the XML that defines the relation, in
parallel metadata formats it is the format (breaking the model of records
being self-contained).

Regardless, there will always need to be at least two metadata formats:
unqualified DC, and some "relatedto" aware format.

[Abstract view]

I'm interested in making links from A to B. To achieve that needs a
comparable thing contained in A that is also contained in B. The "thing" is
either a URI, a string, or structured data:

<a>
<relatedto>
    <identifier/>
    <unstructured_metadata_string/>
    <structured_metadata/>
</relatedto>
</a>

<b>
<identifier/>
<unstructured_metadata_string/>
<structured_metadata/>
</b>

Instantiating this in DC:

"dc:identifier is an unambiguous reference to a resource" - citations are
intended to be unambiguous references.
"dc:relation is a reference to a related resource" - I believe reference
data falls under this description.

So unqualified DC can contain all the data that I'm interested in, and can
be heuristically parsed later on (just as most OAI services do now with
dc.creator and dc.date, and I do now with dc:identifier).

The reason for exposing more structure than strings in OAI is so:

1) Structured source documents don't lose that structure (e.g. BMC XML
articles) and,
2) That Gateway services can provide structured data to downstream services
(e.g. Citebase, CERN)

As I understand DC qualifiers, they give greater structure without breaking
the principal of lowest-common denominator (i.e. remove all the
qualifications and its still understandable and actionable by a system that
talks DC):

<a_dc>
<dc:relation>Smith, John (1992) Functional Principles in Functions v5
44-50</dc:relation>
</a_dc>

Passed through a citation-aware gateway would produce:

<a_dcq>
<dcq:relation.references>
<dcterms:citation>Smith, John (1992) "Functional Principles in Functions" v5
44-50</dcterms:citation>
<dcq:identifier.citation>
<dc:creator>Smith, John</dc:creator>
<dc:date>1992</dc:date>
<dc:title>Functional Principles in Functions</dc:title>
<ja:volume>5</ja:volume>
</dcq:identifier.citation>
</dcq:relation.references>
</a_dcq>

And could be linked against B using {creator,title,date}:

<b>
<dc:identifier>doi:24-23123/xxxyyy</dc:identifier>
<dc:identifier>Smith, John, Smith, Joan (1992) Functional Principles in
Functions, J-Principles volume 5 45-50</dc:identifier>
<dc:creator>Smith, John</dc:creator>
<dc:creator>Smith, Joan</dc:creator>
<dc:title>Functional Principles in Functions</dc:title>
<dc:date>1992</dc:date>
</b>

I've no doubt the same can be achieved through OpenURL, but DC is more
widely used (so the closer to DC, and the more use of DC elements, the
greater the adoption will be, hence the greater the interoperability).

I don't like marc XML because IMHO "<id type='xxx'>", where xxx is a code,
is not useful beyond library systems (i.e. marc XML isn't comprehensible to
the human-reader, at which point you may as well use more efficient binary
mark-up).

All the best,
Tim.

----- Original Message -----
From: "Carl Lagoze" <lagoze@cs.cornell.edu>
To: "Ann Apps" <ann.apps@man.ac.uk>
Cc: "herbert van de sompel" <herbertv@lanl.gov>;
<oai-implementers@oaisrv.nsdl.cornell.edu>
Sent: Thursday, June 20, 2002 12:58 PM
Subject: RE: [OAI-implementers] XSD file for qualified DC

> Ann,
>
> Thanks for the clarifications here.  Yes, I understand the overloading
> of the term "citation".  My colleague Donna Bergmark here at Cornell in:
>
> Bergmark, D. and Lagoze, C., "An Architecture for Automatic Reference
> Linking," presented at 5th European Conference on Research and Advanced
> Technology for Digital Libraries, Darmstadt, Germany, 2001,
>
> Was much more systematic in calling the in-links in the link graph
> "citations" and the out links "references"; in that sense we should then
> really talk about "citation data" as your category one below (the
> bibliographic information for the resource itself) and "reference data"
> as your category two below (bibliographic information for the resources
> referenced by the resource).
>
> Using this terminology I think we all agree that putting reference data
> into Dublin Core is not right.  This is very much a "one-to-one"
> violation in that it would involve putting metadata about another
> resource into the metadata container of some source resource.  Thus,
> there is a clear application of some parallel metadata form to expose
> the reference data; probably following the openURL, bison-fute concepts
> that Herbert has outlines.
>
> Turning attention to the citation data issue, I will argue equally
> strongly that slotting these into the dc identifier element is
> inappropriate.  Citation data is implicitely structured whereas dc
> elements should be simply "appropriate literals" as defined by Tom
> Baker.  Playing a syntactic trick and serializing that data into an
> "appropriate literal" through the use of punctuation such as "Library
> and Information Science Research 22(3), 311-338 (2000)" as you suggest
> in http://epub.mimas.ac.uk/DC/citproposal.html, seems ill-advised with
> data that screams out for markup such as:
>
> <citation>
> <journalTitle>Library and Information Science
> Research</journalTitle>
> <journalVolume>22</journalVolume>
> ....
> </citation>
>
> Since this explicit structure is not currently allowed in DC (and I
> question whether it ever should be) and given the fact that OAI-PMH is
> quite happy expressing parallel structured form, it might be time to
> write the schema for such citation data and encourage people to expose
> it for harvesting, and not characterize it as "dublin core".
>
> Carl
>
> >
> > On Wed, 19 Jun 2002, Ann Apps wrote:
> >
> > > Herbert,
> > >
> > >
> > > I agree entirely with your suggestion about using OpenURL as a
> > > parallel metadata format.
> > >
> > >
> > > However, as the question which started this was about
> > qualified DC, I
> > > would like to point out that there may be some confusion about the
> > > meaning of 'citation', especially about the DC-Citation
> > stuff, which
> > > has also been referred to as connected with OpenURL by the Ariadne
> > > paper (http://www.ariadne.ac.uk/issue27/metadata/). A
> > confusion which
> > > probably wasn't helped by my earlier email.
> > >
> > >
> > > The term 'citation' is used to describe 2 similar but different
> > > things. It is easiest to desribe this for journal articles.
> > >
> > >
> > > 1. The bibliographic citation information (journal, issue,
> > pagination)
> > > for an article as part of the metadata for the article
> > itself. This is
> > > what publishers refer to as the header information for the article.
> > >
> > >
> > > 2. The citation information for papers cited by an article
> > which are
> > > listed in the references section of the article.
> > >
> > >
> > > The DC-Citation work is, so far, about (1). Maybe the choice of the
> > > term 'citation' was unfortunate, because everyone assumes it means
> > > (2), but it's difficult to think of a better word. This is why the
> > > encoding suggested for dc-citation is within a
> > dc:identifier element,
> > > because of the recognition that the bibliographic citation can
> > > effectively identify the article. [This could obviously be
> > > extrapolated to (2) but would be within a
> > > dc:relation/dcterms:references element.]
> > >
> > >
> > > The scenario you describe is for citation (2). Here the parallel
> > > metadata format within a context object you describe looks perfect.
> > > This is obviously a major OAI requirement, for initiatives such as
> > > Citebase.
> > >
> > >
> > > But I think that citation (1) will also be needed as OAI is
> > used for
> > > more than just eprints repositories. For instance, if you wanted to
> > > provide OAI records from an A+I database, or a journal
> > article table
> > > of contents database, you would need to be able to detail the
> > > journal/issue information within each record. I could see
> > this being
> > > of use for harvesting records for the latest journal issues
> > available
> > > in such a service. I think you can still use the OpenURL
> > metadata for
> > > this but that it would be 'nested' within the DC record, similar to
> > > the noddy example I previously wrote. At the moment we're
> > still stuck
> > > with using unrecognised DC structured values in literal
> > strings within
> > > simple DC to pass this information around.
> > >
> > >
> > > But at present, I think that the OAI priority is citations(2), and
> > > this current development looks really promising. Citations(1) will
> > > need more discussion within DC.
> > >
> > >
> > > Best wishes,
> > >
> > > Ann
> > >
> > >
> > >
> > > On Tue, 18 Jun 2002 herbert van de sompel wrote:
> > >
> > >
> > > <color><param>7F00,0000,0000</param>> 1. In the context of the
> > > OAI-PMH, it would make a lot of sense to
> > >
> > > > treat citations as a parallel metadata format.  The unqualified DC
> > >
> > > > record describes the "paper", whereas another record
> > (under the same
> > >
> > > > item) describes all the citations made in the "paper".
> > That is what
> > >
> > > > Carl suggested in his mail.  And that is the approach that Stevan
> > >
> > > > Harnad and I discussed at last year's OAI-related conference in
> > >
> > > > Geneva.  This approach makes sense in that it is extensible: it
> > > > allows
> > >
> > > > other stuff related to the "paper" (for instance usage logs,
> > >
> > > > certification metadata, preservation metadata, etc.) to
> > be treated
> > > > in
> > >
> > > > yet other parallel records under the same item.
> > >
> > > >
> > >
> > > > 2. When it comes to choosing a "metadata format" to describe those
> > >
> > > > citations, looking at OpenURL makes a lot of sense.  Not only
> > > > because
> > >
> > > > it is becoming a standard, but because its purpose really IS to
> > >
> > > > describe stuff (read "citations" in this context) by building on a
> > >
> > > > broad range of identifier-namespaces and a multitude of metadata
> > >
> > > > formats.  Moreover, OpenURL allows not only for the
> > description of a
> > >
> > > > "citation" but (optionally) also of entities that make up the
> > > > context
> > >
> > > > in which the "citation" appears.  That is very significant when
> > >
> > > > thinking about the possibility of open linking at the level of OAI
> > >
> > > > service providers. And it is significant when thinking of using
> > >
> > > > "OpenURL" as a parallel metadata format, as it allows the
> > citation
> > > > to
> > >
> > > > remain attached to the thing in which it is cited.
> > >
> > > >
> > >
> > > </color>[...]
> > >
> > >
> > > <nofill>
> > >
> > ----------------------------------------------------------------------
> > > ----
> > > Mrs. Ann Apps. Senior Analyst - Research & Development, MIMAS,
> > >      University of Manchester, Oxford Road, Manchester, M13 9PL, UK
> > > Tel: +44 (0) 161 275 6039    Fax: +44 (0) 0161 275 6040
> > > Email: ann.apps@man.ac.uk  WWW: http://epub.mimas.ac.uk/ann.html
> > >
> > --------------------------------------------------------------
> > ------------
> > > _______________________________________________
> > > OAI-implementers mailing list
> > > OAI-implementers@oaisrv.nsdl.cornell.edu
> > > http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers
> > >
> >
> >
> > _______________________________________________
> > OAI-implementers mailing list OAI-implementers@oaisrv.nsdl.cornell.edu
> > http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers
> >
> _______________________________________________
> OAI-implementers mailing list
> OAI-implementers@oaisrv.nsdl.cornell.edu
> http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers