[OAI-implementers] dublin core clarification?

Tim Cole Tim Cole" <t-cole3@uiuc.edu
Fri, 10 May 2002 13:47:22 -0500


Jody-

In my experience the question is not stupid, nor is the answer simple and
uncontroversial, nor is any one practice or solution to this problem
universally implemented.

Unqualified DC is inherently imprecise about the distinctions you're making.
We harvest from about 15 registered OAI metadata providers and deal as well
with metadata from about another dozen institutions. Practices are
inconsistent. Some providers use DC elements to describe only the digital
incarnation of a work, some describe only the physical incarnation of a work
(usually because a digital incarnation doesn't exist), most describe
attributes of both physical and digital incarnations in a single metadata
record.

As noted in a previous response, qualified DC allows you to be more precise
and better distinguish between the physical and the digital incarnations in
the space of a single metadata record; however, I disagree that unqualified
DC is inappropriate for describing attributes of a digital version of a
physical object. It is frequently used to describe digital incarnations of
physical objects. (See also the answer to the FAQ #12, Was Dublin Core
metadata designed to be used only to describe digital and Web-based
resources? on the DCMI site, http://dublincore.org/resources/faq/)  The
question of how you do this well, however, is tricky. How much information
about the physical incarnation do you include when describing the digital
incarnation?

An approach that has received some attention in the past is the 1 to 1
approach (1 metadata record for each and every resource, where the physical
and the digital incarnations would be considered separate and distinct,
albeit related resources). Some examples of possible ways to deal with this
issue consistent with that approach are given in, for instance, the CIMI
Guide to Best Practice for Dublin Core
(http://www.cimi.org/public_docs/meta_bestprac_v1_1_210400.pdf). See in
particular the examples in Appendix A. Though focused on usage within the
museum community, these examples might be of interest.

However, there are limitations to a purely 1 to 1 approach, particularly for
cross-domain discovery at this point in time using a protocol like the OAI
PMH. Another approach that we've seen used by many OAI providers is to risk
confusion and ambiguity by describing attributes of both the digital and the
physical incarnations all mixed together in a single metadata record. Thus a
single record may include two DC Date elements, one that is the date of
manufacture for the physical object and one that is the date of the creation
of the digital object. The same record may contain 2 DC Creator elements,
one with the name of the original artist, and one the name of the person who
digitized the work. In unqualified DC there is no machine understandable way
to distinguish meaning between the 2 Dates or 2 Creators, however, many
implementers include labels in their DC Date and Creator values that are at
least human understandable.

A slightly more sophisticated approach, more like the 1 to 1 approach but
not quite that extreme, is to include attributes about both the physical and
the digital in a single record, but in doing so try to make distinctions
between original physical incarnation attributes and derived digital
incarnation attributes through how certain DC elements are used. Thus the
year a physical artifact was created might show up under DC Coverage, while
only the date the digitization was done shows under DC Date. Or a digital
representation of a 3-dimensional sculpture might list as a DC Creator the
photographer who took the digital photograph and list the sculptor of the
piece as a DC Contributor (instead of DC Creator). Source and relation tags
can also be used to help distinguish between the attributes of the digital
incarnation and the physical incarnation of a single intellectual work.
We're seeing this less frequently, but when done consistently across a wide
enough domain it potentially would support more precise searches.

An interesting discussion of some of these issues appeared in a D-Lib
Magazine article by Carl Lagoze
http://www.dlib.org/dlib/january01/lagoze/01lagoze.html

Anyway, the bottom line is that this issue remains a difficult one and is a
hindrance to more effective cross-domain aggregation of metadata. There's
lots of opportunity to try and improve the situation, but I don't think
we're yet at a stage where there's universal agreement, so you may not have
a simple answer. Look at what's been done in other domains and by other
implementers and try to take a sensible approach that builds on those
approaches and the limited degree of consensus that has been built so far.

Tim Cole
University of Illinois at Urbana-Champaign

----- Original Message -----
From: "deridder" <deridder@cs.utk.edu>
To: "OAI Implementors" <oai-implementers@oaisrv.nsdl.cornell.edu>
Sent: Friday, May 10, 2002 10:46 AM
Subject: [OAI-implementers] dublin core clarification?


>
>  I know this is a simple and perhaps stupid question, but I'd rather
> *look* stupid than *be* stupid.  We have some confusion over here:
>
>  Do the Dublin Core elements describe the original item, or the
>     digitized objects made from that item?
>
> The DC webpages simply say all the elements refer to the "resource",
> but I'm not clear as to whether that "resource" is the digital one,
> or the  original item.
>
>  Example:
>    A multi-page letter, written in 1869, is the original document;
>    An SGML transcription with accompanying JPEGs for each page,
>       comprise the digitized object.
>
>  Does the "date" element refer to the creation of the digitized object,
>    or the date the letter was written?
>  Does the "creator" refer to the person who scanned the document and
>    transcribed its contents, or does it refer to the original author
>    of the letter?
>  Does the "format" list text/sgml and JPEGS, or simply text?  And
>    if this original document was an old JPEG about which metadata
>    was created, is the correct format JPEG and that metadata format?
>    Or just "photgraph"?
>
>
>  I have found we have a mix of both in our files at present, and I need
>  some clarity so I can clean up this mess and prevent future madness!!
>   (I would like my scripts and repository to be correct!!!)
>
>
> Thank you in advance!!!!
>
>    --jody
>
>
>
> ***********************************************************
>    PGPKey: http://www.cs.utk.edu/~deridder/jd-pgp.txt
> ***********************************************************
>
>
> _______________________________________________
> OAI-implementers mailing list
> OAI-implementers@oaisrv.nsdl.cornell.edu
> http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers
>