[OAI-implementers] XML Schema problem?

Thomas G. Habing thabing@uiuc.edu
Mon, 23 Apr 2001 10:01:51 -0500


Herbert-

In the XSDs, wouldn't it be simpler to change use attribute value in the
status attribute declaration to "optional" (delete the value attribute) and
then tie it's type to an enumerated list that only allows the value
"deleted".  With no value attribute and the use attribute set to optional
(as opposed to default or fixed) in the status attribute declaration, the
parser shouldn't assume a value.  The enumerated list still restricts the
allowable values of the status attribute in document instances.  This seems
to work in other parsers but we've not tried it in Xerces.  Here's the
attribute declaration as we're suggesting:

  <complexType name="recordType">
   <sequence>
     <element name="header" minOccurs="1" maxOccurs="1"
type="oai:headerType"/>
     <element name="metadata" minOccurs="0" maxOccurs="1"
type="oai:metadataType"/>
     <element name="about" minOccurs="0" maxOccurs="1"
type="oai:aboutType"/>
   </sequence>
     <attribute name="status" use="optional" type="oai:statusType"/>
  </complexType>

 ...

  <simpleType name="statusType">
    <restriction base="string">
     <enumeration value="deleted"/>
    </restriction>
   </simpleType>

Tim Cole
Tom Habing
University of Illinois

herbert van de sompel wrote:
> 
> hi Jeff,
> 
> Thanks for this.  Your consideration is correct, there is a problem in
> the schema that use the "status" attribute.  That is GetRecord,
> ListRecords and ListIdentifiers.
> 
> This is what the September 2000 schema specs say re specifying
> occurencies of attributes.  In the excerpt that I include, reference is
> made to the following declaration in an xsd file:
> 
> <xsd:attribute name="country" type="xsd:NMTOKEN" use="fixed"
> value="US"/>
> 
> "
> Attributes may appear once or not at all (the default), and so the
> syntax for specifying occurrences of attributes
> is different than the syntax for elements. In particular, a use
> attribute is used in an attribute declaration to
> indicate whether the attribute is required or optional, and if optional
> whether the attribute's value is fixed or
> whether there is a default. A second attribute, value, provides any
> value that is called for. To illustrate, po.xsd
> contains a declaration for the country attribute, which is declared with
> use and value values of fixed and US
> respectively. This declaration means that the appearance of a country
> attribute is optional, although its value
> must be US if it does appear, and if it does not appear, a schema
> processor will create a country attribute with
> this value.
> "
> 
> This last line indicates that Xerces is doing the right thing, which is
> obviously not what we want to happen.
> 
> With Michael Nelson, I have revised the XML Schema that involved a
> status attribute.  The solution was less straightforward than one would
> hope.  There is no simple way to express what we really would like to
> express: the status attribute may occur, and if it occurs its value must
> be "deleted".  The workaround is to list legitimate values of the status
> attribute and specify a default.  We chose the values to be "deleted"
> and "not deleted", with "not deleted" as the default.  With this in
> place, one can express in a schema that the status attribute may appear,
> and that its default value (if the attribute does not appear) is "not
> deleted".  One can also express that there is only one other legitimate
> value for status.  It is "deleted".  And this one must be specified,
> since it is not the default value.
> 
> Using this approach nothing really changes for data providers (nor
> service providers, really).  But I guess Xerces will do the right thing,
> now, add the default value of "not deleted" to all records that do not
> have the status attribute specified.
> 
> The way to express the above approach in the schema is different for the
> Sep/Oct 2000 specs that we use and for the most recent XML specs.  but
> that is another story, to be addressed later.
> 
> I attach the edited xsd files.  I will put them in place, unless someone
> disagrees with the approach taken.
> 
> many greetings
> 
> herbert
> 
> "Jeffrey A. Young" wrote:
> >
> > Someone noticed that my OAIHarvester isn't working correctly lately. It
> > turns out that the Xerces XML parser is convinced that all the records I
> > harvest are flagged as status="deleted". Since this clearly isn't the case,
> > I started stripping the program down until I had a small example program
> > showing this effect. The Java source code is attached. Basically, if I do
> > DocumentBuilderFactory.setValidating(true) and then convert the XML to a DOM
> > Document, it silently "corrects" my records to status="deleted". If I dump
> > the Document, all looks fine, but when I actually query the status
> > attribute, it reports back with a value of "deleted". On the other hand, if
> > I specify setValidating(false), everything works fine. I suspect the problem
> > is that the XML Schema needs to make the status attribute optional. Another
> > possibility is that Xerces is processing the XML Schema incorrectly. I can
> > ignore the problem by always using setValidating(false), but that doesn't
> > seem right. If someone has a better solution, I would appreciate it. Thanks.
> >
> > Jeff
> >
> > ---
> > Jeffrey A. Young
> > Senior Consulting Systems Analyst
> > Office of Research, Mail Code 710
> > OCLC Online Computer Library Center, Inc.
> > 6565 Frantz Road
> > Dublin, OH   43017-3395
> > www.oclc.org
> >
> > Voice:  614-764-4342
> > Fax:            614-764-2344
> > Email:  jyoung@oclc.org
> >
> >   ------------------------------------------------------------------------
> >                 Name: Test.java
> >    Test.java    Type: unspecified type (application/octet-stream)
> >             Encoding: quoted-printable
> 
> --
> Herbert Van de Sompel
> Visiting Assistant Professor
> Cornell University -- Computer Science
> tel + 1 - 607 - 255 - 3085
> fax + 1 - 607 - 255 - 4428
> http://www.cs.cornell.edu/people/herbertv/
> digital life in libraries used to be primitive
> 
>   ----------------------------------------------------------------------------
> <schema xmlns="http://www.w3.org/2000/10/XMLSchema"
>          xmlns:oai="http://www.openarchives.org/OAI/1.0/OAI_GetRecord"
>          targetNamespace="http://www.openarchives.org/OAI/1.0/OAI_GetRecord"
>          elementFormDefault="qualified"
>          attributeFormDefault="unqualified">
> 
>  <annotation>
>   <documentation>
>     Schema to verify validity of responses to GetRecord OAI-protocol request.
>     This Schema validated at http://www.w3.org/2000/09/webdata/xsv on 2001-04-22
>     with XSV 1.176/1.87 of 2001/02/16 16:38:43
>   </documentation>
>  </annotation>
> 
>  <element name="GetRecord" type="oai:GetRecordType"/>
> 
>  <!-- response to GetRecord-request -->
> 
>  <complexType name="GetRecordType">
>   <sequence>
>     <element name="responseDate" minOccurs="1" maxOccurs="1" type="timeInstant"/>
>     <element name="requestURL" minOccurs="1" maxOccurs="1" type="string"/>
>     <element name="record" minOccurs="0" maxOccurs="1" type="oai:recordType"/>
>   </sequence>
>  </complexType>
> 
>  <!-- define recordType -->
>  <!-- a record has a header and a metadata part -->
> 
>  <complexType name="recordType">
>   <sequence>
>     <element name="header" minOccurs="1" maxOccurs="1" type="oai:headerType"/>
>     <element name="metadata" minOccurs="0" maxOccurs="1" type="oai:metadataType"/>
>     <element name="about" minOccurs="0" maxOccurs="1" type="oai:aboutType"/>
>   </sequence>
>     <attribute name="status" use="default" value="not deleted" type="oai:statusType"/>
>  </complexType>
> 
>  <!-- define headerType -->
>  <!-- a header has a unique identifier and a datestamp -->
> 
>  <complexType name="headerType">
>   <sequence>
>     <element name="identifier" minOccurs="1" maxOccurs="1" type="uriReference"/>
>     <element name="datestamp" minOccurs="1" maxOccurs="1" type="date"/>
>   </sequence>
>  </complexType>
> 
>  <!-- define metadataType -->
>  <!-- metadata must be expressed in XML that is compliant with another XML Schema -->
>  <!-- metadata must be explicitely qualified in the response -->
> 
>  <complexType name="metadataType">
>   <sequence>
>    <any namespace="##any" processContents="lax"/>
>   </sequence>
>  </complexType>
> 
>  <!-- define aboutType -->
>  <!-- data "about" the record must be expressed in XML -->
>  <!-- that is compliant with an XML Schema defined by a community -->
> 
>  <complexType name="aboutType">
>   <sequence>
>    <any namespace="##any" processContents="lax" minOccurs="0" maxOccurs="1"/>
>   </sequence>
>  </complexType>
> 
>  <!-- define statusType -->
>  <!-- a record can have a status of "deleted" or "not deleted". -->
> 
>  <simpleType name="statusType">
>    <restriction base="string">
>     <enumeration value="deleted"/>
>     <enumeration value="not deleted"/>
>    </restriction>
>   </simpleType>
> 
>  </schema>
> 
>   ----------------------------------------------------------------------------
> <schema xmlns="http://www.w3.org/2000/10/XMLSchema"
>           xmlns:oai="http://www.openarchives.org/OAI/1.0/OAI_ListRecords"
>           targetNamespace="http://www.openarchives.org/OAI/1.0/OAI_ListRecords"
>           elementFormDefault="qualified"
>           attributeFormDefault="unqualified">
> 
>   <annotation>
>    <documentation>
>      Schema to verify validity of responses to ListRecords OAI-protocol request.
>     This Schema validated at http://www.w3.org/2000/09/webdata/xsv on 2001-04-22
>     with XSV 1.176/1.87 of 2001/02/16 16:38:43
>    </documentation>
>   </annotation>
> 
>   <element name="ListRecords" type="oai:ListRecordsType"/>
> 
>   <!-- response to ListRecords-request -->
>   <!-- this response may contain an optional resumptionToken -->
> 
>   <complexType name="ListRecordsType">
>    <sequence>
>      <element name="responseDate" minOccurs="1" maxOccurs="1" type="timeInstant"/>
>      <element name="requestURL" minOccurs="1" maxOccurs="1" type="string"/>
>      <element name="record" minOccurs="0" maxOccurs="unbounded" type="oai:recordType"/>
>      <element name="resumptionToken" minOccurs="0" maxOccurs="1" type="string"/>
>    </sequence>
>    </complexType>
> 
>   <!-- define recordType -->
>   <!-- a record has a header and a metadata part -->
> 
>   <complexType name="recordType">
>    <sequence>
>      <element name="header" minOccurs="1" maxOccurs="1" type="oai:headerType"/>
>      <element name="metadata" minOccurs="0" maxOccurs="1" type="oai:metadataType"/>
>      <element name="about" minOccurs="0" maxOccurs="1" type="oai:aboutType"/>
>    </sequence>
>     <attribute name="status" use="default" value="not deleted" type="oai:statusType"/>
>   </complexType>
> 
>   <!-- define headerType -->
>   <!-- a header has a unique identifier and a datestamp -->
> 
>   <complexType name="headerType">
>    <sequence>
>      <element name="identifier" minOccurs="1" maxOccurs="1" type="uriReference"/>
>      <element name="datestamp" minOccurs="1" maxOccurs="1" type="date"/>
>    </sequence>
>   </complexType>
> 
>   <!-- define metadataType -->
>   <!-- metadata must be expressed in XML that complies with another XML Schema -->
>   <!-- metadata must be explicitely qualified in the response -->
> 
>   <complexType name="metadataType">
>    <sequence>
>     <any namespace="##any" processContents="lax"/>
>    </sequence>
>   </complexType>
> 
>  <!-- define aboutType -->
>  <!-- data "about" the record must be expressed in XML -->
>  <!-- that is compliant with an XML Schema defined by a community -->
> 
>  <complexType name="aboutType">
>   <sequence>
>    <any namespace="##any" processContents="lax" minOccurs="0" maxOccurs="1"/>
>   </sequence>
>  </complexType>
> 
>  <!-- define statusType -->
>  <!-- a record can have a status of "deleted" or "not deleted". -->
> 
>  <simpleType name="statusType">
>    <restriction base="string">
>     <enumeration value="deleted"/>
>     <enumeration value="not deleted"/>
>    </restriction>
>   </simpleType>
> 
> </schema>
> 
>   ----------------------------------------------------------------------------
> <schema xmlns="http://www.w3.org/2000/10/XMLSchema"
>          xmlns:oai="http://www.openarchives.org/OAI/1.0/OAI_ListIdentifiers"
>          targetNamespace="http://www.openarchives.org/OAI/1.0/OAI_ListIdentifiers"
>          elementFormDefault="qualified"
>          attributeFormDefault="unqualified">
> 
>  <annotation>
>   <documentation>
>     Schema to verify validity of responses to ListIdentifiers OAI-protocol request.
>     This Schema validated at http://www.w3.org/2000/09/webdata/xsv on 2001-04-22
>     with XSV 1.176/1.87 of 2001/02/16 16:38:43
>   </documentation>
>  </annotation>
> 
>  <element name="ListIdentifiers" type="oai:ListIdentifiersType"/>
> 
>  <!-- response to ListIdentifiers-request -->
>  <!-- records have an optional "deleted" status -->
>  <!-- this response may contain an optional resumptionToken -->
> 
>  <complexType name="ListIdentifiersType">
>   <sequence>
>     <element name="responseDate" minOccurs="1" maxOccurs="1" type="timeInstant"/>
>     <element name="requestURL" minOccurs="1" maxOccurs="1" type="string"/>
>     <element ref="oai:identifier" minOccurs="0" maxOccurs="unbounded"/>
>     <element name="resumptionToken" minOccurs="0" maxOccurs="1" type="string"/>
>   </sequence>
>  </complexType>
> 
>  <element name="identifier">
>   <complexType>
>    <simpleContent>
>     <extension base="uriReference">
>      <attribute name="status" use="default" value="not deleted" type="oai:statusType"/>
>     </extension>
>    </simpleContent>
>   </complexType>
>  </element>
> 
>  <!-- define statusType -->
>  <!-- a record can have a status of "deleted" or "not deleted". -->
> 
>  <simpleType name="statusType">
>    <restriction base="string">
>     <enumeration value="deleted"/>
>     <enumeration value="not deleted"/>
>    </restriction>
>   </simpleType>
> 
>  </schema>

-- 
Thomas G. Habing
Research Programmer, Digital Library Initiative
University of Illinois at Urbana-Champaign
052 Grainger Engineering Library, MC-274
thabing@uiuc.edu, (217) 244-7809