[OAI-implementers] Superscript- and subscript-nodes

Thomas G. Habing thabing at uiuc.edu
Wed Sep 21 13:08:20 EDT 2005

Dr. Elmar Haake wrote:
> Dear Thomas,
> thank you for your answer!
> Our tangible problem is implementing a chemical summary formular like 
> 2-(Me_2 NCH_3 )C_6 H_4
> What shall we deal with this example?

Using Unicode numeric entities, something like this:


I put together a quick test xml file at:

Using the Firefox web browser it looks fine with all of the subscripts 
appearing, but using IE the _6 does not appear.  I have a lot of fonts 
installed on my workstation, so your results may vary.

Another option would be to define your own metadata schema which allows 
mixed content with embedded <sub> and <sup> elements.  You could even 
define your own schema that allows MathML to be embedded in certain 
elements.  We have some experimental XML metadata schema that allow 
this, but the problem is that most OAI service providers are not 
prepared to deal with any schema beyond simple oai_dc.  Some harvesters 
are beginning to support MODS, Qualified DC, or others, but this still 
doesn't solve the problem of embedded markup such as required by 

> Elmar
> Thomas G. Habing schrieb:
>> Dr. Elmar Haake wrote:
>>> Hi,
>>> Since we are acting as an OAI-Data- and OAI-Service-Provider we are
>>> interested in implementing special characters in our repository.
>>> We guess that it must be possible to implement UTF-8-coded numeric
>>> entities in the OAI2.0-interface. But how is it with superscripted
>>> characters like <sup>2+</sup>?
>>> These characters does not have numeric entities, so I would like to ask
>>> you about this case.
>>> We think, it must be possible to output the node in the way like HTML
>>> (e.g. <sup>2%2B</sup>), because the service-provider could parse them
>>> via XSLT-processor in an own routine.
>>> But the characters "<" and ">" are not URI-encoded, so it would not be
>>> possible to transfer them via the OAI-interface in the described way. If
>>> we encode them, the XSLT-Processor cannot parse it to the HTML-node
>>> (parsing is only possible with nodes beginning with <....>, not with
>>> &lt;...&gt;)
>>> So we resides in a dilemma. As a possible solution we think about
>>> cutting them away. But on this way we would modify the content which is
>>> not optimal for exchanging the metadata.
>>> Has someone of you any experiences or ideas?
>>> Greetings
>>> Elmar
>> There are Unicode code points for various superscripted and 
>> subscripted characters in the ranges U+2070..U+209F plus U+00B2, 
>> U+00B3, and U+00B9.  Because of font issues, most systems cannot 
>> display more than just the numbers 1-3 as superscripts, but it might 
>> be adequate for your needs, for example, <sup>2+</sup> could be 
>> encoded as &#xB2;&#x207A;
>> If you need to represent more complex math in Unicode you might want 
>> to check out the technical report "Unicode Support for Mathematics" at 
>> http://www.unicode.org/reports/tr25/index.html and also

More information about the OAI-implementers mailing list