[OAI-implementers] XML Schema problem?

Young,Jeff jyoung@oclc.org
Mon, 23 Apr 2001 10:31:56 -0400


I tried the new schemas and Xerces seems to be happy with them. Thanks!


-----Original Message-----
From: herbert van de sompel [mailto:herbertv@cs.cornell.edu]
Sent: Sunday, April 22, 2001 2:44 PM
To: OAI-implementers
Subject: Re: [OAI-implementers] XML Schema problem?

hi Jeff,

Thanks for this.  Your consideration is correct, there is a problem in
the schema that use the "status" attribute.  That is GetRecord,
ListRecords and ListIdentifiers.

This is what the September 2000 schema specs say re specifying
occurencies of attributes.  In the excerpt that I include, reference is
made to the following declaration in an xsd file:

<xsd:attribute name="country" type="xsd:NMTOKEN" use="fixed"

Attributes may appear once or not at all (the default), and so the
syntax for specifying occurrences of attributes
is different than the syntax for elements. In particular, a use
attribute is used in an attribute declaration to
indicate whether the attribute is required or optional, and if optional
whether the attribute's value is fixed or
whether there is a default. A second attribute, value, provides any
value that is called for. To illustrate, po.xsd
contains a declaration for the country attribute, which is declared with
use and value values of fixed and US
respectively. This declaration means that the appearance of a country
attribute is optional, although its value
must be US if it does appear, and if it does not appear, a schema
processor will create a country attribute with
this value. 

This last line indicates that Xerces is doing the right thing, which is
obviously not what we want to happen.

With Michael Nelson, I have revised the XML Schema that involved a
status attribute.  The solution was less straightforward than one would
hope.  There is no simple way to express what we really would like to
express: the status attribute may occur, and if it occurs its value must
be "deleted".  The workaround is to list legitimate values of the status
attribute and specify a default.  We chose the values to be "deleted"
and "not deleted", with "not deleted" as the default.  With this in
place, one can express in a schema that the status attribute may appear,
and that its default value (if the attribute does not appear) is "not
deleted".  One can also express that there is only one other legitimate
value for status.  It is "deleted".  And this one must be specified,
since it is not the default value.  

Using this approach nothing really changes for data providers (nor
service providers, really).  But I guess Xerces will do the right thing,
now, add the default value of "not deleted" to all records that do not
have the status attribute specified.

The way to express the above approach in the schema is different for the
Sep/Oct 2000 specs that we use and for the most recent XML specs.  but
that is another story, to be addressed later.  

I attach the edited xsd files.  I will put them in place, unless someone
disagrees with the approach taken.

many greetings


"Jeffrey A. Young" wrote:
> Someone noticed that my OAIHarvester isn't working correctly lately. It
> turns out that the Xerces XML parser is convinced that all the records I
> harvest are flagged as status="deleted". Since this clearly isn't the
> I started stripping the program down until I had a small example program
> showing this effect. The Java source code is attached. Basically, if I do
> DocumentBuilderFactory.setValidating(true) and then convert the XML to a
> Document, it silently "corrects" my records to status="deleted". If I dump
> the Document, all looks fine, but when I actually query the status
> attribute, it reports back with a value of "deleted". On the other hand,
> I specify setValidating(false), everything works fine. I suspect the
> is that the XML Schema needs to make the status attribute optional.
> possibility is that Xerces is processing the XML Schema incorrectly. I can
> ignore the problem by always using setValidating(false), but that doesn't
> seem right. If someone has a better solution, I would appreciate it.
> Jeff
> ---
> Jeffrey A. Young
> Senior Consulting Systems Analyst
> Office of Research, Mail Code 710
> OCLC Online Computer Library Center, Inc.
> 6565 Frantz Road
> Dublin, OH   43017-3395
> www.oclc.org
> Voice:  614-764-4342
> Fax:            614-764-2344
> Email:  jyoung@oclc.org
>   ------------------------------------------------------------------------
>                 Name: Test.java
>    Test.java    Type: unspecified type (application/octet-stream)
>             Encoding: quoted-printable

Herbert Van de Sompel
Visiting Assistant Professor
Cornell University -- Computer Science
tel + 1 - 607 - 255 - 3085
fax + 1 - 607 - 255 - 4428
digital life in libraries used to be primitive