[OAI-implementers] Superscript- and subscript-nodes

Dr. Elmar Haake haake at suub.uni-bremen.de
Thu Sep 22 06:16:44 EDT 2005

With your help I suggest that we have now found the solution.
For exchanging the data via OAI2.0-protocol we could use the UTF-8 
characters in the described way, for representing the data we could 
transform the UTF-8 characters to HTML-nodes or -named entities (We are 
using Cocoon from Apache group, therefore there are no problems with 
specific UTF-8 characters).
We'll try to test it.

Thank you very much, Thomas!

Thomas G. Habing schrieb:

> Dr. Elmar Haake wrote:
>> Dear Thomas,
>> thank you for your answer!
>> Our tangible problem is implementing a chemical summary formular like 
>> 2-(Me_2 NCH_3 )C_6 H_4
>> What shall we deal with this example?
> Using Unicode numeric entities, something like this:
>   2-(Me₂NCH₃)C₆H₄
> I put together a quick test xml file at:
> Using the Firefox web browser it looks fine with all of the subscripts 
> appearing, but using IE the _6 does not appear.  I have a lot of fonts 
> installed on my workstation, so your results may vary.
> Another option would be to define your own metadata schema which 
> allows mixed content with embedded <sub> and <sup> elements.  You 
> could even define your own schema that allows MathML to be embedded in 
> certain elements.  We have some experimental XML metadata schema that 
> allow this, but the problem is that most OAI service providers are not 
> prepared to deal with any schema beyond simple oai_dc.  Some 
> harvesters are beginning to support MODS, Qualified DC, or others, but 
> this still doesn't solve the problem of embedded markup such as 
> required by mathematics.
>> Elmar
>> Thomas G. Habing schrieb:
>>> Dr. Elmar Haake wrote:
>>>> Hi,
>>>> Since we are acting as an OAI-Data- and OAI-Service-Provider we are
>>>> interested in implementing special characters in our repository.
>>>> We guess that it must be possible to implement UTF-8-coded numeric
>>>> entities in the OAI2.0-interface. But how is it with superscripted
>>>> characters like <sup>2+</sup>?
>>>> These characters does not have numeric entities, so I would like to 
>>>> ask
>>>> you about this case.
>>>> We think, it must be possible to output the node in the way like HTML
>>>> (e.g. <sup>2%2B</sup>), because the service-provider could parse them
>>>> via XSLT-processor in an own routine.
>>>> But the characters "<" and ">" are not URI-encoded, so it would not be
>>>> possible to transfer them via the OAI-interface in the described 
>>>> way. If
>>>> we encode them, the XSLT-Processor cannot parse it to the HTML-node
>>>> (parsing is only possible with nodes beginning with <....>, not with
>>>> &lt;...&gt;)
>>>> So we resides in a dilemma. As a possible solution we think about
>>>> cutting them away. But on this way we would modify the content 
>>>> which is
>>>> not optimal for exchanging the metadata.
>>>> Has someone of you any experiences or ideas?
>>>> Greetings
>>>> Elmar
>>> There are Unicode code points for various superscripted and 
>>> subscripted characters in the ranges U+2070..U+209F plus U+00B2, 
>>> U+00B3, and U+00B9.  Because of font issues, most systems cannot 
>>> display more than just the numbers 1-3 as superscripts, but it might 
>>> be adequate for your needs, for example, <sup>2+</sup> could be 
>>> encoded as &#xB2;&#x207A;
>>> If you need to represent more complex math in Unicode you might want 
>>> to check out the technical report "Unicode Support for Mathematics" 
>>> at http://www.unicode.org/reports/tr25/index.html and also

Dr. Elmar Haake
Fachreferat Chemie, IT Dienste
Staats- und Universitaetsbibliothek Bremen

D-28359 Bremen

Tel.: 0049 421 218 3612
Fax:  0049 421 218 2614

aktuell: http://suche.suub.uni-bremen.de

More information about the OAI-implementers mailing list