[OAI-implementers] Special characters, UNICODE, and OAI tools

Xiaoming Liu liu_x@cs.odu.edu
Tue, 13 Feb 2001 02:16:42 -0500 (EST)


hi,

On Mon, 12 Feb 2001, Caroline Arms wrote:

> 
> Several (although not all) special characters are coming through when I
> use ARC with Netscape 4.7 on Windows.  Internet 5.5 doesn't do any better
> than Netscape 4.7.  Also, not coming through are a few "XML sanity"
> entities, which we have been expressing as "old-fashioned" character
> entities.  I don't claim to be an XML character encoding expert; for OAI
> we accepted the recommendation of our standards office to keep using this
> handful of character entities (e.g. ') in that form.  What do others
> think the practice should be on these?  They presumably validate against
> the schema because they get through Hussein's Explorer.
> 

Thanks for the message. It's **partially** solved now. For the part 
solved, it's a bug in program. For the part I did not solve, I need more
test.

Solved part:

The "Entity Reference" is widely used in loc archive, as DOM level 1
specified in
http://www.w3.org/TR/2000/WD-DOM-Level-1-20000929/level-one-core.html#ID-11C98490
, an "Entity Reference" may or may **not** be expanded into Unicode
during parsing, Unfortunately, the parser I am using (java xml parser from
sun), does the expanding randomly, I did not notice this problem before
and treated all as expanding. So the sample URL 1 is working fine after
the bug-fix.

no solved:
In sample2, some expanded "Entity Reference"s are not correctly processed.
I have to do more test.

regards,
liu





> Sample GetRecord URLs that show the issues are:
> 
> http://memory.loc.gov/cgi-bin/oai1_0?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:lcoa1:loc.music/musdi.213
> 
>   Title includes apostrophe in     d'une
> 
> and 
> 
> http://memory.loc.gov/cgi-bin/oai1_0?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:lcoa1:loc.music/musdi.215
> 
>   4 special czech characters (regular letters with diacritics)
> 
> 
>    Any thoughts and experiences welcome.  
> 
>    Thanks.                       Caroline Arms              caar@loc.gov
>                                  National Digital Library Program
>                                  Library of Congress
> 
> _______________________________________________
> OAI-implementers mailing list
> OAI-implementers@oaisrv.nsdl.cornell.edu
> http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers
>