[OAI-implementers] Qualified Dublin Core

Stephen Crawley crawley at dstc.edu.au
Thu Aug 12 22:33:21 EDT 2004

Hi Tim,

> I'm not sure what is meant here. Seem to be implying a way that I can send
> OAI harvesters metadata consisting entirely of elements that have no
> community-standard semantic labels, which seems counter-intuitive. Even if
> some of my metadata elements make sense only to my local application, don't
> I want to label at least those other elements in my record that do
> correspond to community-standard semantics with the namespace and element
> names of that standard semantic schema(s)?  

That is not how it works.

The schema identifier supplied with the harvested record IMPLICITLY
takes you to a set of community-standard semantic labels for element
names, encodings and language that are used in the record.  Provided
that you / your software understands this space, you then (try to)
map from the semantic space of labels (and values) of the harvest
record to the semantic space of your own metadata schema.  If you
can't, then the record is not immediately useful to you.

> > In the simple model, 
> > the OAI repository assembles XML records containing whatever 
> > elements it is prepared to publish. The OAI client would then 
> > sort through the supplied elements, throwing away any that it 
> > doesn't want / understand, and massaging others as required.
> > Then the client validates the filtered records against its 
> > own metadata schema before it decides what to do with them.
> I agree with Jeff to this extent, if none of the metadata elements in a
> harvested metadata record are labeled with an element name I know prefixed
> by a metadata schema namespace I know, I'll have to throw all the elements
> away, and the record will therefore be worthless to me.


> > The problem of elements meaning different things in different 
> > schemas is a bit tricky.  
> Yes, an element called <spatial> might mean one thing in metadata schema A
> and something quite different in metadata schema B. As a harvester, if I
> encounter an element <spatial> that is not tied to the dcterms schema and
> namespace (or some other namespace I know) I would always discard it rather
> than assume it means the same thing as <dcterms:spatial>. It's not a safe
> assumption that just because something is labeled <spatial> in a local
> schema it means the same thing as <spatial> in dcterms. 

Obviously!  Performing ANY mapping / filtering when you do not know
the relationship between the schema of the source and target schemas
is fundamentally unsound.

> ... That's why XML
> namespaces are so handy. The data provider can explicitly and unambiguously
> tie an element in his or her record to one specific, community-standard
> metadata semantic set.

I disagree.  The XML namespaces (i.e. OAI record formats) are actually a
LOSSY way of expressing semantics.  Or at least that's what happens in
practice ... when people try to shoe-horn metadata into some existing
OAI record schema that isn't quite right.

My point is that a real metadata schema includes something that says
what the elements, refinements, encodings, etc all mean.   Currently,
that something is usually English text, but in the future it might
be augmented with machine readable cross-references to standard thesauri,
ontologies, etcetera.

Current day OAI-style XML schemas are not metadata schemas.  Rather they
are formats for transporting metadata records that may (or may not)
fully conform to some real metadata schema.   Other formats include RDF,
HTML meta tags, domain specific formats as in MARC and ANZLIC, and even
clunky ad-hoc mappings to spread-sheets.

> > However, you can get some traction 
> > if each record's metadata schema identifier is included in the record.
> >
> But XML and XML Schema Language already have standard, well-defined
> mechanisms (namespaces and the ability to import or include other XSDs
> within you XSD) that make it easy to identify the semantic set with which
> any given element in a metadata instance is associated. Why not just use
> those XML standard approaches?

See above.  XML schemas are not metadata schemas.  And namespace importing
is not rich enough to express the subtle semantic relationships that can
exist between different metadata schemas.

If we were to treat XML schemas as defacto metadata schemas, then anyone
who manages the "primary production" of metadata might need to define
their own XML schema.  Imagine what impact this would have in terms of
"balkanization" of the OAI harvesting world.

(Note: this is a real issue.  Nearly all of our customers have defined
their own metadata schemas. These are typically based on DCQ or AGLS,
but most have additional elements and / or encoding schemes, and most
refine the meaning of the standard elements in subtle ways.)

What I am proposing is a way to avoid this balkanization by defining a
way to interchange metadata independent of its metadata schema. What an
OAI client does with this metadata is ... up to the client. If the
client understands the source metadata schema, it could try to map the
records elements, etcetera.  Alternatively, the client could simply
store the records as is including the elements, encodings, and values
that it does not understand or that it thinks are invalid.  

> Well the schema you attached failed XML well-formedness (apparent typos on
> lines 64, 102, and 110, also appear to have left out the attribute name
> [i.e., "value"] on lines 110, 111, and 117). There also appears to be a
> glitch (at least according to XSV) with complexType definition of
> REG:RecordType. So, I'm not sure I exactly understand what you're trying to
> do with your schema. But from looking at it and from what you said about in
> your note, it seems to me like just another way to do much the same thing
> that can be done in a more XML standard way by importing/including other
> schemas and namespaces of interest in your local XSD and labeling the
> elements in your metadata record instances accordingly. 

All I can say is that it is sufficiently well-formed for the XML parsers
in JDK 1.4.2 / jaxp-1.2 to understand it ...

-- Steve

| Stephen Crawley                  | MetaSuite Project Leader
| Level 7, GP South Building (78)  | Distributed Systems Technology CRC
| Staff House Road                 | Tel   : +61 7 3365 4310
| The University of Queensland     | Fax   : +61 7 3365 4311
| Queensland 4072                  | Email : crawley at dstc.edu.au
| Australia                        | WWW   : http://www.dstc.edu.au
|                                  | DSTC is the Australian W3C Office

More information about the OAI-implementers mailing list