[OAI-implementers] XML Schema problem?

herbert van de sompel herbertv@cs.cornell.edu
Sun, 22 Apr 2001 14:44:26 -0400


This is a multi-part message in MIME format.
--------------763AD80798C4E0A8A1B247FF
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

hi Jeff,

Thanks for this.  Your consideration is correct, there is a problem in
the schema that use the "status" attribute.  That is GetRecord,
ListRecords and ListIdentifiers.

This is what the September 2000 schema specs say re specifying
occurencies of attributes.  In the excerpt that I include, reference is
made to the following declaration in an xsd file:

<xsd:attribute name="country" type="xsd:NMTOKEN" use="fixed"
value="US"/>

"
Attributes may appear once or not at all (the default), and so the
syntax for specifying occurrences of attributes
is different than the syntax for elements. In particular, a use
attribute is used in an attribute declaration to
indicate whether the attribute is required or optional, and if optional
whether the attribute's value is fixed or
whether there is a default. A second attribute, value, provides any
value that is called for. To illustrate, po.xsd
contains a declaration for the country attribute, which is declared with
use and value values of fixed and US
respectively. This declaration means that the appearance of a country
attribute is optional, although its value
must be US if it does appear, and if it does not appear, a schema
processor will create a country attribute with
this value. 
"

This last line indicates that Xerces is doing the right thing, which is
obviously not what we want to happen.

With Michael Nelson, I have revised the XML Schema that involved a
status attribute.  The solution was less straightforward than one would
hope.  There is no simple way to express what we really would like to
express: the status attribute may occur, and if it occurs its value must
be "deleted".  The workaround is to list legitimate values of the status
attribute and specify a default.  We chose the values to be "deleted"
and "not deleted", with "not deleted" as the default.  With this in
place, one can express in a schema that the status attribute may appear,
and that its default value (if the attribute does not appear) is "not
deleted".  One can also express that there is only one other legitimate
value for status.  It is "deleted".  And this one must be specified,
since it is not the default value.  


Using this approach nothing really changes for data providers (nor
service providers, really).  But I guess Xerces will do the right thing,
now, add the default value of "not deleted" to all records that do not
have the status attribute specified.

The way to express the above approach in the schema is different for the
Sep/Oct 2000 specs that we use and for the most recent XML specs.  but
that is another story, to be addressed later.  

I attach the edited xsd files.  I will put them in place, unless someone
disagrees with the approach taken.

many greetings

herbert



"Jeffrey A. Young" wrote:
> 
> Someone noticed that my OAIHarvester isn't working correctly lately. It
> turns out that the Xerces XML parser is convinced that all the records I
> harvest are flagged as status="deleted". Since this clearly isn't the case,
> I started stripping the program down until I had a small example program
> showing this effect. The Java source code is attached. Basically, if I do
> DocumentBuilderFactory.setValidating(true) and then convert the XML to a DOM
> Document, it silently "corrects" my records to status="deleted". If I dump
> the Document, all looks fine, but when I actually query the status
> attribute, it reports back with a value of "deleted". On the other hand, if
> I specify setValidating(false), everything works fine. I suspect the problem
> is that the XML Schema needs to make the status attribute optional. Another
> possibility is that Xerces is processing the XML Schema incorrectly. I can
> ignore the problem by always using setValidating(false), but that doesn't
> seem right. If someone has a better solution, I would appreciate it. Thanks.
> 
> Jeff
> 
> ---
> Jeffrey A. Young
> Senior Consulting Systems Analyst
> Office of Research, Mail Code 710
> OCLC Online Computer Library Center, Inc.
> 6565 Frantz Road
> Dublin, OH   43017-3395
> www.oclc.org
> 
> Voice:  614-764-4342
> Fax:            614-764-2344
> Email:  jyoung@oclc.org
> 
>   ------------------------------------------------------------------------
>                 Name: Test.java
>    Test.java    Type: unspecified type (application/octet-stream)
>             Encoding: quoted-printable

-- 
Herbert Van de Sompel
Visiting Assistant Professor
Cornell University -- Computer Science
tel + 1 - 607 - 255 - 3085
fax + 1 - 607 - 255 - 4428
http://www.cs.cornell.edu/people/herbertv/
digital life in libraries used to be primitive
--------------763AD80798C4E0A8A1B247FF
Content-Type: text/plain; charset=us-ascii;
 name="OAI_GetRecord-status.xsd"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="OAI_GetRecord-status.xsd"

<schema xmlns="http://www.w3.org/2000/10/XMLSchema"
         xmlns:oai="http://www.openarchives.org/OAI/1.0/OAI_GetRecord"
         targetNamespace="http://www.openarchives.org/OAI/1.0/OAI_GetRecord"
         elementFormDefault="qualified"
         attributeFormDefault="unqualified">

 <annotation>
  <documentation>
    Schema to verify validity of responses to GetRecord OAI-protocol request.
    This Schema validated at http://www.w3.org/2000/09/webdata/xsv on 2001-04-22
    with XSV 1.176/1.87 of 2001/02/16 16:38:43
  </documentation>
 </annotation>

 <element name="GetRecord" type="oai:GetRecordType"/>

 <!-- response to GetRecord-request -->

 <complexType name="GetRecordType">
  <sequence>
    <element name="responseDate" minOccurs="1" maxOccurs="1" type="timeInstant"/>
    <element name="requestURL" minOccurs="1" maxOccurs="1" type="string"/>
    <element name="record" minOccurs="0" maxOccurs="1" type="oai:recordType"/>
  </sequence>
 </complexType>

 <!-- define recordType -->
 <!-- a record has a header and a metadata part -->

 <complexType name="recordType">
  <sequence>
    <element name="header" minOccurs="1" maxOccurs="1" type="oai:headerType"/>
    <element name="metadata" minOccurs="0" maxOccurs="1" type="oai:metadataType"/>
    <element name="about" minOccurs="0" maxOccurs="1" type="oai:aboutType"/>
  </sequence>
    <attribute name="status" use="default" value="not deleted" type="oai:statusType"/>
 </complexType>

 <!-- define headerType -->
 <!-- a header has a unique identifier and a datestamp -->

 <complexType name="headerType">
  <sequence>
    <element name="identifier" minOccurs="1" maxOccurs="1" type="uriReference"/>
    <element name="datestamp" minOccurs="1" maxOccurs="1" type="date"/>
  </sequence>
 </complexType>

 <!-- define metadataType -->
 <!-- metadata must be expressed in XML that is compliant with another XML Schema -->
 <!-- metadata must be explicitely qualified in the response -->

 <complexType name="metadataType">
  <sequence>
   <any namespace="##any" processContents="lax"/>
  </sequence>
 </complexType>

 <!-- define aboutType -->
 <!-- data "about" the record must be expressed in XML -->
 <!-- that is compliant with an XML Schema defined by a community -->

 <complexType name="aboutType">
  <sequence>
   <any namespace="##any" processContents="lax" minOccurs="0" maxOccurs="1"/>
  </sequence>
 </complexType>

 <!-- define statusType -->
 <!-- a record can have a status of "deleted" or "not deleted". -->

 <simpleType name="statusType">
   <restriction base="string">
    <enumeration value="deleted"/>
    <enumeration value="not deleted"/>
   </restriction>
  </simpleType>

 </schema>

--------------763AD80798C4E0A8A1B247FF
Content-Type: text/plain; charset=us-ascii;
 name="OAI_ListRecords-status.xsd"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="OAI_ListRecords-status.xsd"

<schema xmlns="http://www.w3.org/2000/10/XMLSchema"
          xmlns:oai="http://www.openarchives.org/OAI/1.0/OAI_ListRecords"
          targetNamespace="http://www.openarchives.org/OAI/1.0/OAI_ListRecords"
          elementFormDefault="qualified"
          attributeFormDefault="unqualified">

  <annotation>
   <documentation>
     Schema to verify validity of responses to ListRecords OAI-protocol request.
    This Schema validated at http://www.w3.org/2000/09/webdata/xsv on 2001-04-22
    with XSV 1.176/1.87 of 2001/02/16 16:38:43
   </documentation>
  </annotation>

  <element name="ListRecords" type="oai:ListRecordsType"/>

  <!-- response to ListRecords-request -->
  <!-- this response may contain an optional resumptionToken -->

  <complexType name="ListRecordsType">
   <sequence>
     <element name="responseDate" minOccurs="1" maxOccurs="1" type="timeInstant"/>
     <element name="requestURL" minOccurs="1" maxOccurs="1" type="string"/>
     <element name="record" minOccurs="0" maxOccurs="unbounded" type="oai:recordType"/>
     <element name="resumptionToken" minOccurs="0" maxOccurs="1" type="string"/>
   </sequence>
   </complexType>

  <!-- define recordType -->
  <!-- a record has a header and a metadata part -->

  <complexType name="recordType">
   <sequence>
     <element name="header" minOccurs="1" maxOccurs="1" type="oai:headerType"/>
     <element name="metadata" minOccurs="0" maxOccurs="1" type="oai:metadataType"/>
     <element name="about" minOccurs="0" maxOccurs="1" type="oai:aboutType"/>
   </sequence>
    <attribute name="status" use="default" value="not deleted" type="oai:statusType"/>
  </complexType>

  <!-- define headerType -->
  <!-- a header has a unique identifier and a datestamp -->

  <complexType name="headerType">
   <sequence>
     <element name="identifier" minOccurs="1" maxOccurs="1" type="uriReference"/>
     <element name="datestamp" minOccurs="1" maxOccurs="1" type="date"/>
   </sequence>
  </complexType>

  <!-- define metadataType -->
  <!-- metadata must be expressed in XML that complies with another XML Schema -->
  <!-- metadata must be explicitely qualified in the response -->

  <complexType name="metadataType">
   <sequence>
    <any namespace="##any" processContents="lax"/>
   </sequence>
  </complexType>

 <!-- define aboutType -->
 <!-- data "about" the record must be expressed in XML -->
 <!-- that is compliant with an XML Schema defined by a community -->

 <complexType name="aboutType">
  <sequence>
   <any namespace="##any" processContents="lax" minOccurs="0" maxOccurs="1"/>
  </sequence>
 </complexType>

 <!-- define statusType -->
 <!-- a record can have a status of "deleted" or "not deleted". -->

 <simpleType name="statusType">
   <restriction base="string">
    <enumeration value="deleted"/>
    <enumeration value="not deleted"/>
   </restriction>
  </simpleType>

</schema>
--------------763AD80798C4E0A8A1B247FF
Content-Type: text/plain; charset=us-ascii;
 name="OAI_ListIdentifiers-status.xsd"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="OAI_ListIdentifiers-status.xsd"

<schema xmlns="http://www.w3.org/2000/10/XMLSchema"
         xmlns:oai="http://www.openarchives.org/OAI/1.0/OAI_ListIdentifiers"
         targetNamespace="http://www.openarchives.org/OAI/1.0/OAI_ListIdentifiers"
         elementFormDefault="qualified"
         attributeFormDefault="unqualified">

 <annotation>
  <documentation>
    Schema to verify validity of responses to ListIdentifiers OAI-protocol request.
    This Schema validated at http://www.w3.org/2000/09/webdata/xsv on 2001-04-22
    with XSV 1.176/1.87 of 2001/02/16 16:38:43
  </documentation>
 </annotation>

 <element name="ListIdentifiers" type="oai:ListIdentifiersType"/>

 <!-- response to ListIdentifiers-request -->
 <!-- records have an optional "deleted" status -->
 <!-- this response may contain an optional resumptionToken -->

 <complexType name="ListIdentifiersType">
  <sequence>
    <element name="responseDate" minOccurs="1" maxOccurs="1" type="timeInstant"/>
    <element name="requestURL" minOccurs="1" maxOccurs="1" type="string"/>
    <element ref="oai:identifier" minOccurs="0" maxOccurs="unbounded"/>
    <element name="resumptionToken" minOccurs="0" maxOccurs="1" type="string"/>
  </sequence>
 </complexType>

 <element name="identifier">
  <complexType>
   <simpleContent>
    <extension base="uriReference">
     <attribute name="status" use="default" value="not deleted" type="oai:statusType"/>
    </extension>
   </simpleContent>
  </complexType>
 </element>

 <!-- define statusType -->
 <!-- a record can have a status of "deleted" or "not deleted". -->

 <simpleType name="statusType">
   <restriction base="string">
    <enumeration value="deleted"/>
    <enumeration value="not deleted"/>
   </restriction>
  </simpleType>

 </schema>
--------------763AD80798C4E0A8A1B247FF--