[OAI-implementers] XML Schema for OAI compliance with NSDL harvesting

John Perkins jperkins@fox.nstn.ca
Thu, 13 Dec 2001 13:58:03 -0400


I'd like to chip in on this from perhaps our in CIMI's rather naive 
view of how this all might work and our understanding of the 
intentions of OAI.

One of the reasons we were excited by OAI was because it gave a clean 
way to address both the interoperability issues across communities by 
mandating DC and it allowed communities of implementers to adopt 
additional metadata element sets for describing things more richly or 
other stuff for which DC simple was clearly inadequate or where some 
hybrid of DC plus other elements was desired.

In the museum community CIMI ran a 2 year testbed that resulted in 
the publishing of a guide to best practice for the use of simple DC 
for the purposes of discovery.  We noted that significant richness 
was lost and it was problematic for many museum applications but for 
the purposes of museums and other communities discovering museum 
resources it would work fine. What we gained by many communities 
adopting DC for discovery would far offset the lack of depth and 
breadth at the discovery level.  We also concluded that using DC 
qualifed was an additional level of complexity that didn't buy us any 
more at the discovery level and so we suggested museums not use it.

Instead we are developing other metadata formats for describing 
collections and items. One of these is the SPECTRUM standard for 
which we are deep into a schema definition. In our OAI work we intend 
to support this as a community specific metadata format extension, 
publishing the schema so others who want to understand it or use it 
can do so. We also expect that some in our community will want to use 
hybrid schemas such as DC simple plus DC:education elements such as 
audience level.  That's fine but we feel each of those should have 
it's own schema and be considered extension sets not replacement of 
the OAI core set DC.

We hope the OAI Tech/Implementers group will not complicate matters 
by dropping the current DC simple schema as the core required schema. 
I think this is what CArl is saying in a message that arrived after I 
started writing this and snipped in here.

From: "Carl Lagoze" <lagoze@cs.cornell.edu>
To: "Rachel Heery" <r.heery@ukoln.ac.uk>,
Sender: oai-implementers-admin@oaisrv.nsdl.cornell.edu
That is fine for intra-profile interoperability.  Our problem in OAI is
that, as you know, we would like to establish a modicum of cross-domain
interoperability.  Therein lies the discomfort of loosing up our
mandatory DC defined by a reasonably simple schema, as in
http://www.openarchives.org/OAI/1.1/dc.xsd, to dc following application
schema.  As far as I know, and correct me if I'm wrong, interoperability
amongst application profiles, is uncharted territory.

That said, for the OAI to continue with the concept of a "single
required metadata format for interoperability" it appears that good old
simple dc or some other suitable cross-domain metadata format (not
embedded in multipel application profiles) is the only feasible choice.


  At 7:34 AM -0500 12/13/01, Carl Lagoze wrote:
>Thanks for the note.  Since I am closely involved in both OAI and NSDL,
>I was thinking of this issue also.  I hope you don't mind that I've
>reflected this issue back to the tech group and to Diane Hillmann, my
>colleague at Cornell who is the metadata specialist in NSDL. 
>I see us getting drawn into the implementation end of what the dc
>community calls "application profiles".  For a brief bit of background
>to the rest of the group:
>- DC started out with this concept of 15 elements with some fuzzy
>thinking that these elements would make up some kind of interoperable
>metadata "record", defined as a packaging of the elements e.g., as a set
>of meta tags, or xml file, etc.  As noted many times, all elements are
>optional and repeatable in instances of such records, and the data types
>of values of elements is undefined.  This allows some measure of
>interoperabilty among "dc records, albeit pretty low, essentially saying
>"in a set of many 'dc records' you will only find the specified 15
>elements, but no guarantee which ones will be there".  Essentially, this
>is what we've tried to achieve with mandatory dc and dc.xsd in OAI-PMH
>- DC then moved onto the notion of "qualified dc", where semantics or
>value constraints on DC could be tightenned.  In this process, there was
>still some fuzzy thinking that there might be a "dc record" but instead
>of being made up of statements like "date is Sept. 1, 2001", it would
>consist of statements like "date, created, is 2001-09-01, in ISO 8601
>format".  The previous loose interoperability assumption than existed
>with the added "dumb-down" notion specifying "you wil find something
>more constrained than a standard dc element/value pair and you will be
>able to map it back to its unqualified form".  As we all know, in
>OAI-PMH we decided
>- The latest DC thinking now includes the notion of "application
>profiles", saying that a "metadata record" may include dc elements (all
>optional, all repeatable) that can be mixed and matched with metadata
>elements from one or more other metadata vocabularies.  The nature of
>the mixing is, I believe, unspecified by DC - i.e., the application
>profile notion may allow for a dc element to be nested within the value
>structure of an element from another vocabulary (such as the dc-ed
>vocabulary mentioned in Tim's note).  For example, imagine a metadata
>record such as:
>	<dc.date>2000-01-01</dc.date>
>	<blatz.personality>nasty</blatz.personality>
>Interoperability in this world of application profiles now becomes
>"interoperability among a set of records conforming to several
>application profiles means that somewhere in each of those records are 0
>or more dc elements".  Of course, one could restrict interoperability
>within one application profile, which is presumably defined by some
>schema, but then the presumed cross domain interoperability purpose of
>dc is seemingly lost.
>In the OAI world, I see we are left with the following options:
>1. Remain at our notion of original, albeit low-level, dc
>interoperability - demand through a schema a record of unqualifed dc
>2. Loosen the schema to allow qualified dc
>3. Allow for full-blast dc "application profiles", dc elements may be
>mixed with otheres from other vocabularies in unconstrained ways,
>undefined by a schema.
>4. find some proper mid-points. - for example, create an xml schema that
>specifies 0 or more dc elements at the top level, perhaps qualified,
>that are mixed with other top-level elements.
>My, as usual subjective, view is:
>1. our original goal of mandatory dc for interoperability in OAI-PMH is
>looking somewhat threadbare anyway, in light of the fact that dc as a
>record format is appropriate for non-bibliographic items - people,
>2. the application profile stuff sounds fine from a conversational
>perspective, but at the level of implementation and interoperability it
>is far from clean.  I'm not certain what a schema defining the rules for
>interopreability among application profiles would look like.
>Perhaps Andy Powell can kick in with some thinking since he is more
>involved in DCMI than I am.  Or Diane Hillmann whose NSDL and DCMI ties
>are close.
>>  -----Original Message-----
>>  From: Tim Cole [mailto:t-cole3@uiuc.edu]
>>  Sent: Wednesday, December 12, 2001 6:43 PM
>>  To: Carl Lagoze
>>  Cc: Thomas G. Habing
>>  Subject: Re: [OAI-implementers] XML Schema for OAI compliance
>>  with NSDL
>>  harvesting
>>  Carl-
>>  I know XML Schema was one of last week's Tech Committee
>>  topics, but Zubair's
>>  note suggested an additional issue the Committee might want
>>  to consider:
>>  The way I read the NSDL metadata recommendation, they're endorsing the
>>  addition of a dc-ed:audience element to the base 15 elements
>>  of simple DC.
>>  Metadata files including this element will not validate against either
>  > current or our proposed oai_dc.xsd.  If OAI decides to revise
>>  oai_dc XML
>>  schema along lines suggested in XML Schema
>>  (http://oaitech.comm.nsdlib.org/WhitePapers/xml_schema_whitepa
>>  per.htm),
>>  should we also allow for optional inclusion of dc-ed:audience element?
>>  Tom Habing and I have been experimenting this week with what
>>  an XSD that
>>  allows inclusion of dc-ed:audience element might look.  When we're
>>  satisfied, I'll go ahead and post (on SourceForge XML Schema
>>  Forum) details
>>  of how this might be done for the Committee's consideration. 
>>  Might be a
>>  complication we don't want to undertake, but on the other
>>  hand, it could
>>  facilitate use of OAI in NSDL context, and likely in IMLS
>>  context as well.
>>  If we decide to revise oai_dc.xsd, we should at least
>>  consider it as an
>>  option.
>>  Tim Cole
>>  University of Illinois at Urbana-Champaign
>>  ----- Original Message -----
>>  From: <zubair@cs.odu.edu>
>>  To: <schalk@unf.edu>
>>  Cc: <collections-group@nsdl1.comm.nsdlib.org>;
>>  <oai-implementers@oaisrv.nsdl.cornell.edu>; "Carl Lagoze"
>>  <lagoze@cs.cornell.edu>
>>  Sent: Sunday, December 09, 2001 2:44 PM
>>  Subject: RE: [OAI-implementers] XML Schema for OAI compliance
>>  with NSDL
>>  harvesting
>>  >
>>  > Dear Stuart,
>>  >
>>  > I was there in the Dec. 3-4 NSDL PIs meeting and may be
>>  able to add a
>>  > little more to what Carl has already stated. After talking to Diane
>>  Hillman
>>  > and browsing information on the NSDL website, this is my
>>  understanding.
>>  >
>>  > "The NSDL Standards Working Group has determined that the
>>  Dublin Core set
>>  > of 15 qualified elements, plus the elements recommended by the DC
>>  Education
>>  > Working Group, will be the standard set used by the NSDL metadata
>>  > repository.
>>  > (Ref:
>>  >
>>  http://siteforscience.nsdl.cornell.edu/metadata_info/overview.
>>  html#NSDL ).
>>  >
>>  > Besides DC-Ed, they have also identified other formats NSDL plans to
>>  > support. You can get information about these from the Web sites:
>>  >
>>  > http://siteforscience.nsdl.cornell.edu/metadata_info/outline.html
>>  > http://www.smete.org/nsdl/workgroups/standards/standards_home.html
>>  >
>>  >
>>  > These metadata formats can  be supported as parallel
>>  metadata formats in
>>  > OAI. Regarding Mac OS X, I recently bought a Powerbook G4
>>  with OS X and
>>  > there is full Java support on it and it comes with Apache
>>  web server. I
>>  > have not yet tried the Tomcat servlet engine on it - there is one
>>  available
>>  > for it. In summary,  you should be able to host an  OAI compliant
>>  > collection using some of the Java based tools available on
>>  the tool site
>>  > pointed out by Carl. In fact you should be able to use even
>>  Perl tools (OS
>>  > X is based on BSD Unix).
>>  >
>>  >
>>  > Zubair
>>  >
>>  >
>>  >
>>  >
>>  >
>>  >
>>  >
>>  >                     "Carl Lagoze"
>>  >                     <lagoze@cs.cornell.edu>                     To:
>>  "Stuart Chalk" <schalk@unf.edu>,
>>  >                     Sent by:
>>  <oai-implementers@oaisrv.nsdl.cornell.edu>
>>  >                     oai-implementers-admin@oaisrv.nsdl.c        cc:
>>  <collections-group@nsdl1.comm.nsdlib.org>
>>  >                     ornell.edu                             
>>      Subject:
>>  RE: [OAI-implementers] XML Schema for OAI compliance with
>>  >                                                                 NSDL
>>  harvesting
>>  >
>>  >                     12/09/2001 01:55 PM
>>  >
>>  >
>>  >
>>  >
>>  >
>>  >
>>  > Stuart,
>>  >
>>  > Thanks for your question.  I've taken the liberty to add
>>  the members of
>>  > NSDL collections to this response.
>>  >
>>  > Regarding your first question: The OAI already defines an
>>  xml schema for
>>  > its one required metadata format - dublin core.  This is
>>  explained at
>>  >
>>  http://www.openarchives.org/OAI_protocol/openarchivesprotocol.
>>  html#dubli
>>  > ncore and the actual schema is at
>>  > http://www.openarchives.org/OAI/1.1/dc.xsd.  As for the
>>  other metadata
>>  > formats that NSDL participants will share, there are as far
>>  as I know no
>>  > established schema as of yet.  Perhaps Diane Hillmann can
>>  chip in here
>>  > and share whether this is true.  If there are indeed no
>  > schemas as of
>>  > yet, we should certainly settle on these in the near future.
>>  >
>>  > Regarding your second question:  This is the right list for
>>  discussion
>>  > of hardward and software for acting as an OAI data
>>  provider.  At this
>>  > point I don't know of anyone who is using Mac OS X for this but that
>>  > doesn't mean that there isn't someone else who isn't doing
>>  that. There
>>  > is a growing list of tools at
>>  > http://www.openarchives.org/tools/tools.html and I believe that the
>>  > DLESE folks are working on a java based implementation that
>>  should run
>>  > with little problem on OS X.
>>  >
>>  > Carl
>>  >
>>  > > -----Original Message-----
>>  > > From: Stuart Chalk [mailto:schalk@unf.edu]
>>  > > Sent: Sunday, December 09, 2001 6:35 AM
>>  > > To: oai-implementers@oaisrv.nsdl.cornell.edu
>>  > > Subject: [OAI-implementers] XML Schema for OAI compliance
>>  with NSDL
>>  > > harvesting
>>  > >
>>  > >
>>  > > I would like to thank all of you that were at the meeting talking
>>  > > about harvesting and OAI.  Being a novice at this I
>>  > > appreciate how well
>>  > > the area of OAI was described and I now have a much better
>>  > > perspective on
>>  > > its use and implementation.
>>  > >
>>  > > My question, I hope, is simple. From reading
>>  > > http://www.openarchives.org/OAI_protocol/openarchivesprotocol.html
>>  > > I get that I need to generate a few files that describe
>>  the different
>>  > > schema for supplying metadata via OAI.  I also see the
>>  > > formatted XML that
>>  > > needs to be returned to the requester.  My question is - is
>>  > > there yet an
>>  > > NSDL format for the data returned via XML, and what are the
>>  > > minimum number
>>  > > of schema files I need to generate so that I can support NSDL
>>  > > harvesting
>>  > > via OAI?  If they do exist where are they and is there a help
>>  > > file that
>>  > > goes with them?
>>  > >
>>  > > Any help greatly appreciated.
>>  > >
>>  > > On a separate topic - is there a list like this for
>>  discussion of the
>>  > > hardware and software being used to serve the collection data?  As
>>  > > webmaster of the "Analytical Sciences Digital Library"
>>  > > project I want to
>>  > > use Mac OS X, Webstar V, Lasso Pro 5, and Filemaker Pro
>>  5.5 all on the
>>  > > same Mac.  I am convinced that this will be both powerful
>>  enough and
>>  > > scalable enough my co-PIs are worried I will run into trouble
>>  > > using this
>>  > > for production.  Any places to go greatly appreciated also.
>>  > >
>>  > > --
>>  > > Stuart Chalk, Ph.D.
>>  > > Department of Natural Sciences
>>  > > Phone:904-620-2831
>>  > > University of North Florida
>>  > > Fax:904-620-3885
>>  > > 4567 St. Johns Bluff Road S.                  "The Flow
>>  > > Analysis Database"
>>  > > Jacksonville FL 32224 USA
>>  > http://www.fia.unf.edu/
>>  >
>>  > _______________________________________________
>>  > OAI-implementers mailing list
>>  > OAI-implementers@oaisrv.nsdl.cornell.edu
>>  > http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers
>>  > _______________________________________________
>>  > OAI-implementers mailing list
>>  > OAI-implementers@oaisrv.nsdl.cornell.edu
>>  > http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers
>>  >
>>  >
>>  >
>>  > _______________________________________________
>>  > OAI-implementers mailing list
>>  > OAI-implementers@oaisrv.nsdl.cornell.edu
>>  > http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers
>>  >
>OAI-implementers mailing list