[OAI-implementers] rdf

lagoze@cs.cornell.edu lagoze@cs.cornell.edu
Wed, 2 May 2001 07:15:19 -0400


Sorry to enter this dialog a little late, I was out of town.

I have a question about the goal here of "passing RDF" via OAI
protocols. 

A quick catch up to make sure we're all on the same page: RDF, first and
foremost, as described in the RDF model and syntax document (M+S)
http://www.w3.org/TR/REC-rdf-syntax/, is a data model describing typed
relationships between uniquely identified entities.  This data model is
often described in terms of a directed acyclic graph, but can also be
expressed in a number of other representations; e.g. triples with  a
subject, verb and object.  Among these representations there is an XML
syntax for serialize RDF graphs that makes use of the RDF M+S namespace
at http://www.w3.org/1999/02/22-rdf-syntax-ns#.  

In addition, there is a an RDF schema specification at
http://www.w3.org/TR/2000/CR-rdf-schema-20000327/ that uses the notions
in RDF M+S to express knowledge about semantic relationships.  I'll
quickly note that while RDF "schema" and XML "schema" are both "schema",
comparing them is a little like doing the same for apples and oranges.
XML schema can generally be thought of as a data validation tool,
allowing one to specify the structure of an XML data stream, with the
ability to express rather detailed contraints on tree structure, data
types, etc.  RDF schema should be thought of as a tool for ontology
definition, making it possible to express class, sub-class
relatioinships and property, sub-property relationships.  For example,
in RDF schema one could state that a concept from one namespace (e.g., a
"CREATOR" in dublin core) is a "type of" a concept in another
relationship (e.g., an "AGENT").  There are some constraint mechanisms
in RDF schema, but they are farily week and not the major goal of RDF
schema.  

My colleague Jane Hunter did an excellent job comparing the
functionality of the tool schema mechanisms in a paper she is presenting
at WWW10 this week in Hong Kong, it is available at
http://archive.dstc.edu.au/RDU/staff/jane-hunter/www10/paper.html.  

I should state that there are other "schema languages" floating around
related to XML schema: Schematron, RELAX, and of course good old DTDs.

We have employed XML schema at two levels in the OAI protocol:

1. To define the format of responses to all OAI protocol requests.
2. To define the format of metadata streams embedded in the GetRecord
and ListRecords responses.  

In both cases our goal was to provide a mechanism for some degree of
data validation.  I say "some degree" since conformance to a schema does
not guarantee the integrity of the data (e.g. I can create Dublin Core
that is complete nonsense even though it conforms to the oai-dc schema).


Now onto the issue of "passing RDF metadata in OAI responses", in
particular.  I need to understand the motivation for this as I evaluate,
with Herbert and other people in the OAI community, our choices in OAI.
Is it:

1. A desire to mix multiple namespaces in a metadata record (e.g., mix
dc tags with GEM tags).  In the DC community this is currently called
devising an "application profile".
2. A concern that xml schema are too tightly constraining.  This has
been a concern raised in a number of mail lists where the issue is that
the concept Dublin Core (that expressed by a namespace URI) is distinct
from a particular data formating. (these identity issues run around the
AI community, e.g. Carl the child and Carl the adult have different
forms but are the same concept).
3. The fact that some places have metadata stored in XML RDF and just
want to expose that without further processing.  By this I mean that the
metadata looks like:

<RDF xmlns = "http://www.w3.org/TR/WD-rdf-syntax#"
           xmlns:dc = "http://purl.org/dc/elements/1.0/">
   <Description about = "URI:R">
     <dc:Title> CIMI Presentation </dc:Title>
     <dc:Creator> Eric Miller </dc:Creator>
   </Description>
</RDF>

4. A desire to use other primitives in the RDF and/or RDFS namespace
such as the container primitives (alt, seq, etc.).

Adressing each of these:

1. It is not necessary to use RDF to mix elements from multiple
namespaces.  One can write an XML schema to allow this.  
2. A colleague closely involved in the RDF community has criticized the
OAI protocol on this basis.  His claim is that XML schema is criticized
both due to its complexity and the fact that data format validation is
much less important than concept identity (i.e., for a thing to be
Dublin Core it shouldn't have to look one exact way).  In fact, the
distinction between namespaces and schema expressions demonstrates this.
A namespace URI is a different animal that the URL of a schema.  The
former is a unique identity for a concept, not necessarily resolvable to
any concrete expression.  The latter is a concrete meta-definition of
that concept.  Developing technologies like RDDL express the fact that
an abstract concept (Dublin Core) can have multiple concrete
meta-definitions (e.g., as a natural language description, an RDF
schema, a schematron schema, 2 xml schema, etc.).  
3. Wrapping a metadata in RDF tags doesn't make it "RDF".  As said
earlier, RDF is really much more than a set of XML tags.  IF this is
indeed the motivation here, I'd humbly suggest stripping off the outer
RDF tags before embedding in an OAI response.
4. The use of other RDF primitives in the metadata description starts to
make this more interesting and I'd like to understand more of the
particulars.  

In closing, creating an XML schema for an RDF stream so it can embedded
in an OAI protocol requests seems, in my opinion, to not be the best
approach.  It has the flavor of mixing apples and oranges.  If, indeed,
the desire is to use some of the semantic expression functinality of
RDF, then we in the OAI community need to consider our rather tight
commitment to XML schema.  I'd love to see continued discussion about
this.

I hope this all helps.  Sorry for the very long note but there are some
many intertwined issues, that trying to make them explicit is often the
best approach.

Carl

> -----Original Message-----
> From: Eric Lease Morgan [mailto:eric_morgan@ncsu.edu]
> Sent: Monday, April 30, 2001 8:53 PM
> To: oai-implementers@oaisrv.nsdl.cornell.edu
> Subject: Re: [OAI-implementers] rdf
> 
> 
> Thomas G. Habing <thabing@uiuc.edu> wrote:
> 
> >> It sort of sounds to me that in order to create 
> alternative metadata formats
> >> to be used in OAI one must create an XSD -- a schema, and 
> RDF does not
> >> cleanly fit into schemas. Correct?
> >> 
> > Yes and not entirely sure yet.  OAI requires an XML Schema 
> be available in
> > order to validate (check the correctness) of any alternate 
> metadata formats.
> > But, after sending the previous message, I was able to find 
> an XML Schema for
> > RDF (http://www.w3.org/2000/07/rdf.xsd), but I haven't had 
> a chance to test it
> > yet.  I suspect it will require some tweaking in order to 
> work with the latest
> > XML Schema spec (a moving target).  Plus, schemas for any 
> other namespaces
> > that you intend to embed in the RDF will also have to be developed.
> > 
> > Anyway, this is something that we are actively pursuing, 
> and are happy to
> > share once we figure out more ourselves.
> 
> This looks promising. I will explore it as well. Thank you, and don't
> hesitate to pass along anything you happen to learn.
> 
> -- 
> Eric Lease Morgan
> 
> _______________________________________________
> OAI-implementers mailing list
> OAI-implementers@oaisrv.nsdl.cornell.edu
> http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers
>