[OAI-implementers] Superscript- and subscript-nodes
Dr. Elmar Haake
haake at suub.uni-bremen.de
Thu Sep 22 06:16:44 EDT 2005
With your help I suggest that we have now found the solution.
For exchanging the data via OAI2.0-protocol we could use the UTF-8
characters in the described way, for representing the data we could
transform the UTF-8 characters to HTML-nodes or -named entities (We are
using Cocoon from Apache group, therefore there are no problems with
specific UTF-8 characters).
We'll try to test it.
Thank you very much, Thomas!
Thomas G. Habing schrieb:
> Dr. Elmar Haake wrote:
>> Dear Thomas,
>> thank you for your answer!
>> Our tangible problem is implementing a chemical summary formular like
>> 2-(Me_2 NCH_3 )C_6 H_4
>> What shall we deal with this example?
> Using Unicode numeric entities, something like this:
> I put together a quick test xml file at:
> Using the Firefox web browser it looks fine with all of the subscripts
> appearing, but using IE the _6 does not appear. I have a lot of fonts
> installed on my workstation, so your results may vary.
> Another option would be to define your own metadata schema which
> allows mixed content with embedded <sub> and <sup> elements. You
> could even define your own schema that allows MathML to be embedded in
> certain elements. We have some experimental XML metadata schema that
> allow this, but the problem is that most OAI service providers are not
> prepared to deal with any schema beyond simple oai_dc. Some
> harvesters are beginning to support MODS, Qualified DC, or others, but
> this still doesn't solve the problem of embedded markup such as
> required by mathematics.
>> Thomas G. Habing schrieb:
>>> Dr. Elmar Haake wrote:
>>>> Since we are acting as an OAI-Data- and OAI-Service-Provider we are
>>>> interested in implementing special characters in our repository.
>>>> We guess that it must be possible to implement UTF-8-coded numeric
>>>> entities in the OAI2.0-interface. But how is it with superscripted
>>>> characters like <sup>2+</sup>?
>>>> These characters does not have numeric entities, so I would like to
>>>> you about this case.
>>>> We think, it must be possible to output the node in the way like HTML
>>>> (e.g. <sup>2%2B</sup>), because the service-provider could parse them
>>>> via XSLT-processor in an own routine.
>>>> But the characters "<" and ">" are not URI-encoded, so it would not be
>>>> possible to transfer them via the OAI-interface in the described
>>>> way. If
>>>> we encode them, the XSLT-Processor cannot parse it to the HTML-node
>>>> (parsing is only possible with nodes beginning with <....>, not with
>>>> So we resides in a dilemma. As a possible solution we think about
>>>> cutting them away. But on this way we would modify the content
>>>> which is
>>>> not optimal for exchanging the metadata.
>>>> Has someone of you any experiences or ideas?
>>> There are Unicode code points for various superscripted and
>>> subscripted characters in the ranges U+2070..U+209F plus U+00B2,
>>> U+00B3, and U+00B9. Because of font issues, most systems cannot
>>> display more than just the numbers 1-3 as superscripts, but it might
>>> be adequate for your needs, for example, <sup>2+</sup> could be
>>> encoded as ²⁺
>>> If you need to represent more complex math in Unicode you might want
>>> to check out the technical report "Unicode Support for Mathematics"
>>> at http://www.unicode.org/reports/tr25/index.html and also
Dr. Elmar Haake
Fachreferat Chemie, IT Dienste
Staats- und Universitaetsbibliothek Bremen
Tel.: 0049 421 218 3612
Fax: 0049 421 218 2614
More information about the OAI-implementers