[OAI-implementers] Special characters, UNICODE, and OAI tools
Tue, 13 Feb 2001 02:16:42 -0500 (EST)
On Mon, 12 Feb 2001, Caroline Arms wrote:
> Several (although not all) special characters are coming through when I
> use ARC with Netscape 4.7 on Windows. Internet 5.5 doesn't do any better
> than Netscape 4.7. Also, not coming through are a few "XML sanity"
> entities, which we have been expressing as "old-fashioned" character
> entities. I don't claim to be an XML character encoding expert; for OAI
> we accepted the recommendation of our standards office to keep using this
> handful of character entities (e.g. ') in that form. What do others
> think the practice should be on these? They presumably validate against
> the schema because they get through Hussein's Explorer.
Thanks for the message. It's **partially** solved now. For the part
solved, it's a bug in program. For the part I did not solve, I need more
The "Entity Reference" is widely used in loc archive, as DOM level 1
, an "Entity Reference" may or may **not** be expanded into Unicode
during parsing, Unfortunately, the parser I am using (java xml parser from
sun), does the expanding randomly, I did not notice this problem before
and treated all as expanding. So the sample URL 1 is working fine after
In sample2, some expanded "Entity Reference"s are not correctly processed.
I have to do more test.
> Sample GetRecord URLs that show the issues are:
> Title includes apostrophe in d'une
> 4 special czech characters (regular letters with diacritics)
> Any thoughts and experiences welcome.
> Thanks. Caroline Arms firstname.lastname@example.org
> National Digital Library Program
> Library of Congress
> OAI-implementers mailing list