[OAI-implementers] XML Schema

Carl Lagoze lagoze@cs.cornell.edu
Fri, 7 Dec 2001 09:32:02 -0500


Leigh,

Thanks for your comments.  Allow me to add my two cents:

As described in the protocol document at
http://www.openarchives.org/OAI_protocol/openarchivesprotocol.html#ListM
etadataFormats and the xml schema for the response to
ListMetadataFormats at
http://www.openarchives.org/OAI/1.1/OAI_ListMetadataFormats.xsd, the
response to ListMetadataFormats returns a list of triples:

1. The single token metadata prefix, intended as the local (in the scope
of the responding server) key for the respective metadata format.
2. The URL of the xsd for the metadata format, intended to describe the
data format of the metadata instances returned by this server in
response to requests for the respective metadata format.
3. The namespace URI, intended as the global identification for this
metadata format.

Your comment is correct about the 1st item in the trip, the metadata
prefix, that its scope is local to the server.  However, you imply that
we are using the xsd URL as a means of global identification and are
ignoring the semantics of namespace URIs.  In fact, we are not, as
indicated by my descriptions of the 2nd and 3rd items of the triple
above.  So, for example, a client upon getting a triple such as:

(foo, http://foometa.org/foo.xsd, http://foometa.org/foo#) should
interpret this as:

"This server locally uses the name foo to indicate the metadata concept
globally known as http://foometa.org/foo# and marks up instance data for
the metadata concept foo according to the rules defined by
http://foometa.org/foo.xsd."

Another server may return:

(fooalt, http://altfoometa.org/fooall.xsd, http://foometa.org/foo#)
indicating a different markup of the same metadata concept.

As for XML schema and RDDL, we have talked about this.  A couple of
points of note here:

1. If I understand correctly, an RDDL instance document is meant to sit
at the end of a namespace URI.  By requiring a namespace URI for each
metadata format, OAI therefore allows extensibility to RDDL - an
implementor employ RDDL to index multiple descriptions of their metadata
format used in OAI.
2. Since RDDL is at the end of a namespace URI, there is no conflict
between requiring xml schema and the allowance for RDDL (i.e., other
schema description types - RDFS, RELAX, schematron, etc.)
3. The requirement for an XML schema is quite flexible in that one can
employ it to express rather tight data format constraints for a metadata
instance, or essentially "null it out" and express a schema that allows
any legal xml.

In summary, I think that the current means in OAI-PMH for describing
metadata formats achieves the goals of global identification, flexible
data formating rules, and extension to other schema tools.

I may be missing something here and welcome any comments.

Carl

> -----Original Message-----
> From: Leigh Dodds [mailto:ldodds@ingenta.com]
> Sent: Friday, December 07, 2001 6:01 AM
> To: Van de Sompel, Herbert; oai-implementers@oaisrv.nsdl.cornell.edu
> Subject: RE: [OAI-implementers] XML Schema
> 
> 
> Hi,
> 
> > This issue is currently being discussed by the oai-tech 
> group, as part of
> > the ongoing revision/stabilization of the metadata harvesting
> > protocol.  we hope to conclude that work with the release 
> of a version 2.0
> of
> > the protocol around April/May 2002.
> 
> Thanks, it's useful to know that this is an open issue.
> 
> > it would be good to hear from other implementers on this 
> list whether they
> > see the need to allow for other schema languagues for metadata
> > containers in the protocol.
> 
> Another way to ask this question is: should OA-MHP care about what
> schema might be used to validate the metadata returned in a record?
> 
> I think a perfectly valid answer is, no. Some reasoning:
> 
> The protocol is designed to support multiple metadata formats, with
> DC as a minimum. The metadata prefix is a handy way to request
> that a Data Provider returns responses conforming to a 
> particular metadata
> format.
> 
> As prefixes have an undefined scope (they may become standardised,
> they may not), the only identifier for the metadata format that an
> application
> can currently rely on is the metadata schema (i.e. the URI of 
> the schema).
> 
> For example,  one can envisage a form of negotiation where a Service
> Provider attempts to identify whether a Data Provider is capable of
> delivering
> metadata in one format, and if not, fall back to DC. The 
> application will
> therefore
> need to identify that a given prefix in this repository is 
> 'bound' to a
> format
> it understands, so that it can make this decision.
> 
> An alternative, and well-defined way of identifying a 
> particular vocabulary
> is by it's Namespace URI (NS-URI). This identifier has the 
> advantage of
> being
> agnostic to a particular schema language. Technologies such 
> as RDDL [1]
> provide
> other useful 'value-added' functionality with an NS-URI as a 
> starting point.
> E.g. human-readable documentation, and a machine processable 
> directory of
> resources (that may include multiple schema languages).
> 
> To conclude, one way to resolve this issue is to alter the 
> definition of
> ListMetadataFormats such that the metadataFormat/schema element
> contains not the URI for the schema, but the NS-URI. A 'best practice'
> recommendation to document namespaces using RDDL might also be useful.
> 
> This would make OA-MHP completely agnostic to the particular schema
> language that may be used to validate a response (assuming validation
> takes place at all), while retaining the ability to uniquely 
> define the
> format
> of metadata required.
> 
> [1] http://www.rddl.org
> 
> Cheers,
> 
> L.
> 
> _______________________________________________
> OAI-implementers mailing list
> OAI-implementers@oaisrv.nsdl.cornell.edu
> http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers
>