[OAI-implementers] Superscript- and subscript-nodes
Thomas G. Habing
thabing at uiuc.edu
Wed Sep 21 13:08:20 EDT 2005
Dr. Elmar Haake wrote:
> Dear Thomas,
> thank you for your answer!
> Our tangible problem is implementing a chemical summary formular like
> 2-(Me_2 NCH_3 )C_6 H_4
> What shall we deal with this example?
Using Unicode numeric entities, something like this:
I put together a quick test xml file at:
Using the Firefox web browser it looks fine with all of the subscripts
appearing, but using IE the _6 does not appear. I have a lot of fonts
installed on my workstation, so your results may vary.
Another option would be to define your own metadata schema which allows
mixed content with embedded <sub> and <sup> elements. You could even
define your own schema that allows MathML to be embedded in certain
elements. We have some experimental XML metadata schema that allow
this, but the problem is that most OAI service providers are not
prepared to deal with any schema beyond simple oai_dc. Some harvesters
are beginning to support MODS, Qualified DC, or others, but this still
doesn't solve the problem of embedded markup such as required by
> Thomas G. Habing schrieb:
>> Dr. Elmar Haake wrote:
>>> Since we are acting as an OAI-Data- and OAI-Service-Provider we are
>>> interested in implementing special characters in our repository.
>>> We guess that it must be possible to implement UTF-8-coded numeric
>>> entities in the OAI2.0-interface. But how is it with superscripted
>>> characters like <sup>2+</sup>?
>>> These characters does not have numeric entities, so I would like to ask
>>> you about this case.
>>> We think, it must be possible to output the node in the way like HTML
>>> (e.g. <sup>2%2B</sup>), because the service-provider could parse them
>>> via XSLT-processor in an own routine.
>>> But the characters "<" and ">" are not URI-encoded, so it would not be
>>> possible to transfer them via the OAI-interface in the described way. If
>>> we encode them, the XSLT-Processor cannot parse it to the HTML-node
>>> (parsing is only possible with nodes beginning with <....>, not with
>>> So we resides in a dilemma. As a possible solution we think about
>>> cutting them away. But on this way we would modify the content which is
>>> not optimal for exchanging the metadata.
>>> Has someone of you any experiences or ideas?
>> There are Unicode code points for various superscripted and
>> subscripted characters in the ranges U+2070..U+209F plus U+00B2,
>> U+00B3, and U+00B9. Because of font issues, most systems cannot
>> display more than just the numbers 1-3 as superscripts, but it might
>> be adequate for your needs, for example, <sup>2+</sup> could be
>> encoded as ²⁺
>> If you need to represent more complex math in Unicode you might want
>> to check out the technical report "Unicode Support for Mathematics" at
>> http://www.unicode.org/reports/tr25/index.html and also
More information about the OAI-implementers