[OAI-implementers] pointies in abstracts

Tim Cole Tim Cole" <t-cole3@uiuc.edu
Mon, 2 Jul 2001 09:30:55 -0500


----- Original Message -----
From: "Ben Henley" <ben@biomedcentral.com>
To: <oai-implementers@oaisrv.nsdl.cornell.edu>
Sent: Monday, July 02, 2001 5:34 AM
Subject: [OAI-implementers] pointies in abstracts


>
>
> A quick question:
>
> What's the best way to handle "pointies" (greater-than and
> less-than) inside record elements? If an abstract happens to contain a
> phrase like "we studied the mating behaviour of horses at < 5 degrees
> Kelvin", this can obviously stop the XML being well-formed.
> Currently we are encoding *all* character entities (including
> ampersand, degrees sign etc.) as HTML entities. Is this right?
>

Per the XML 1.0 spec (2nd edition, Oct 6, 2000) as maintained by the W3C,
the only pre-defined entities in XML are:

&amp;
&lt;
&gt;
&apos;
&quot;

All other [named] entity references must be declared.  This takes care of
your "pointies" (&gt; and &lt;) but obviously the XML pre-defined set of
named entity references is not as extensive as the HTML 4.0 set of
pre-defined named entity references.  The XML set however is extensible for
a given XML application (see below).

XML does intrinsically recognize numeric character references of the form:

CharRef ::= '&#' [0=9] + ';' | '&#x' [0-9a-fA-F] + ';'

as a valid way to refer to specific characters in the ISO/IEC 10646
character set (i.e., Unicode).  Thus it will recognize &#62; and &#x003E; as
valid character references for the '>' character.  If starting from scratch
you may wish to use numeric character references of this form in preference
to named entity references.

If you need to use named entity references for legacy or for other reasons,
then to conform to XML you'll need to include named entity declarations in
your XML application.  This can be done various ways, e.g., in an internal
subset in each XML document or by associating an external DTD containing all
the necessary named entity declarations with your XML documents.  We've
found that most XML parsers will accept DTDs that contain only a listing of
named entity declarations (i.e., you don't have to include element
declarations describing document structure) -- HOWEVER, we've not attempted
to include named entities (other than the 5 pre-defined for all XML) in our
OAI applications.  We've always used numeric character references.  I assume
properly declared named entities can be used in OAI context, but I'll defer
to Hussein or someone else who may actually have done this.

Tim Cole
Library, University of Illinois at Urbana-Champaign

> Sorry if this has been discussed already.
> Thanks,
>
> Ben
>
> Ben Henley <mailto:ben@biomedcentral.com>
> Special Projects Editor
> BioMed Central
> http://www.biomedcentral.com
>
>
>
> _______________________________________________
> OAI-implementers mailing list
> OAI-implementers@oaisrv.nsdl.cornell.edu
> http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers
>