[OAI-implementers] XML Schemas and Xerces again

Young,Jeff jyoung@oclc.org
Tue, 24 Apr 2001 16:46:28 -0400


This message is in MIME format. Since your mail reader does not understand
this format, some or all of this message may not be legible.

------_=_NextPart_000_01C0CCFF.A302A440
Content-Type: text/plain;
	charset="iso-8859-1"

I'm happy to say that the status=deleted problem appears to be resolved.
Unfortunately, I now seem to have a different (unrelated) problem. Someone
reported to me that Xerces 1.3.1 is reporting an XML schema error where
1.3.0 didn't. It seems that I had failed to call setErrorHandler() which is
key to reporting any validation errors. Xerces 1.3.0 let this slide where
1.3.1 complains about it. Now that I've corrected this oversight, I'm now
seeing some parser errors related to the XML schema. I've attached another
small demo application that shows the effects. To add to the confusion,
1.3.0 reports a different error than does 1.3.1. 

Using Xerces 1.3.0, the demo application produces:

error
org.xml.sax.SAXParseException: Datatype error: In element 'identifier' :
Value 'oai:etdcat:ocm02999966' is a Malformed URI .
        at
org.apache.xerces.framework.XMLParser.reportError(XMLParser.java:1068)
        at
org.apache.xerces.validators.common.XMLValidator.checkContent(XMLValidator.j
ava:3609)
        at
org.apache.xerces.validators.common.XMLValidator.callEndElement(XMLValidator
.java:1133)
        at
org.apache.xerces.framework.XMLDocumentScanner$ContentDispatcher.dispatch(XM
LDocumentScanner.java:1201)
        at
org.apache.xerces.framework.XMLDocumentScanner.parseSome(XMLDocumentScanner.
java:381)
        at org.apache.xerces.framework.XMLParser.parse(XMLParser.java:952)
        at
org.apache.xerces.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:12
3)
        at Test.main(Test.java:34)

Using Xerces 1.3.1, the demo produces:

error
org.xml.sax.SAXParseException: The content of element type "metadata" must
match "##any:uri=http://www.openarchives.org/OAI/1.0/OAI_ListRecords".
        at
org.apache.xerces.framework.XMLParser.reportError(XMLParser.java:1067)
        at
org.apache.xerces.validators.common.XMLValidator.reportRecoverableXMLError(X
MLValidator.java:1689)
        at
org.apache.xerces.validators.common.XMLValidator.callEndElement(XMLValidator
.java:1353)
        at
org.apache.xerces.framework.XMLDocumentScanner$ContentDispatcher.dispatch(XM
LDocumentScanner.java:1205)
        at
org.apache.xerces.framework.XMLDocumentScanner.parseSome(XMLDocumentScanner.
java:381)
        at org.apache.xerces.framework.XMLParser.parse(XMLParser.java:952)
        at
org.apache.xerces.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:17
2)
        at Test.main(Test.java:34)


As far as I can tell, the schema look fine. My assumption, at this point, is
that Xerces is at fault and my only recourse is turn off validation. I must
also admit the possibility that my program is flawed in some way. On the
slim chance that I've found the 2nd and 3rd XML schema errors within the
span of a week, though, I thought I'd pass along my findings.

 <<Test.java>> 
Cheers,

Jeff

---
Jeffrey A. Young
Senior Consulting Systems Analyst
Office of Research, Mail Code 710
OCLC Online Computer Library Center, Inc.
6565 Frantz Road
Dublin, OH   43017-3395
www.oclc.org

Voice:	614-764-4342
Voice:	800-848-5878, ext. 4342
Fax:	614-718-7477
Email:	jyoung@oclc.org




------_=_NextPart_000_01C0CCFF.A302A440
Content-Type: application/octet-stream;
	name="Test.java"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment;
	filename="Test.java"

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import java.io.*;
import org.w3c.dom.*;
import org.xml.sax.*;

public class Test {
    public static void main(String[] args) {
	try {
	    DocumentBuilderFactory factory =3D
		DocumentBuilderFactory.newInstance();
	    factory.setValidating(true);
	    factory.setNamespaceAware(true);
	    DocumentBuilder parser =3D factory.newDocumentBuilder();
	    parser.setErrorHandler(new ErrorHandler() {
		public void fatalError(SAXParseException e)
		    throws SAXException {
		    System.out.println("fatalError");
		    throw e;
		}
		public void error(SAXParseException e)
		    throws SAXParseException {
		    System.out.println("error");
		    throw e;
		}
		public void warning(SAXParseException e)
		    throws SAXParseException {
		    System.out.println("** Warning, line" + e.getLineNumber() +
				       ", uri " + e.getSystemId());
		    System.out.println(" " + e.getMessage());
		    e.printStackTrace();
		}
	    });
	    String xml =3D "<ListRecords =
xmlns=3D\"http://www.openarchives.org/OAI/1.0/OAI_ListRecords\" =
xmlns:xsi=3D\"http://www.w3.org/2000/10/XMLSchema-instance\" =
xsi:schemaLocation=3D\"http://www.openarchives.org/OAI/1.0/OAI_ListRecor=
ds =
http://www.openarchives.org/OAI/1.0/OAI_ListRecords.xsd\"><responseDate>=
2001-04-24T13:06:23-05:00</responseDate><requestURL>http://alcme.oclc.or=
g:4342/etdcat/servlet/OAIHandler?metadataPrefix=3Doai_dc&amp;until=3D200=
1-04-23&amp;verb=3DListRecords&amp;from=3D2000-01-01</requestURL><record=
><header><identifier>oai:etdcat:ocm02999966</identifier><datestamp>2001-=
02-02</datestamp></header><metadata><dc =
xmlns=3D\"http://purl.org/dc/elements/1.1/\" =
xmlns:xsi=3D\"http://www.w3.org/2000/10/XMLSchema-instance\" =
xsi:schemaLocation=3D\"http://purl.org/dc/elements/1.1/ =
http://www.openarchives.org/OAI/dc.xsd\"><language>eng</language><date>1=
976</date><type>Text data</type><creator>Singh, Bibhuti =
Narayan,--1945-</creator><title>Gibberellin metabolism in higher plant =
tissues.</title><format>[4], xii, 124 leaves</format><format>29 =
cm.</format><description>Microfilm copy of typescript. Ann Arbor, =
University Microfilms, 1977. 1 reel. 35 =
mm.</description><subject>Plants, Effect of gibberellins =
on.</subject><subject>Plants, Effect of gibberellic acid =
on</subject><subject>Gibberellic =
acid</subject></dc></metadata></record>";
	    StringReader sr =3D new StringReader(xml);
	    InputSource is =3D new InputSource(sr);
	    Document doc =3D parser.parse(is);
	} catch (Exception e) {
	    e.printStackTrace();
	}   =20
    }
}

------_=_NextPart_000_01C0CCFF.A302A440--