[OAI-implementers] Perl (5.005), utf-8 and special characters: is this a FAQ?

Simeon Warner simeon@cs.cornell.edu
Thu, 26 Dec 2002 18:35:50 -0500 (EST)


On Fri, 27 Dec 2002, Marin Balgarensky wrote:
> Hi all,
> 
> first of all, best wishes for the New Year!
> 
> Can anybody tell me how to handle special characters like ö
> in the XML output?

The appropriate UTF-8 respresentation of the character should be used.

> I thought Perl and the XML::Writer are doing
> the conversion automatically, but for now I am getting the error:

XML::Writer just escapes the characters that are special in XML
(that is & < > "). Everything else is expected to in the appropriate
character encoding already.
 
> An invalid XML character (Unicode: 0x1b2ea5) was found in the element
> content of the document.

I have no idea how you might get character 0x1b2ea5 -- this is not a 
valid Unicode character. &ouml; should be 0xf6

(from: http://www.unicode.org/Public/UNIDATA/UnicodeData.txt)
00F6;LATIN SMALL LETTER O WITH DIAERESIS

My guess is that you are not correctly producing UTF-8 from whatever
source documents you have (which like use some other character encoding).
 
Cheers,
Simeon.


> respectively in IE:
> 
> An Invalid character was found in text content. Line 15, Position 28 
>  
>      <dc:creator>Tanja A. B?l</dc:creator>
> ---------------------------^
> 
> 
> The question mark is supposed to be the german o with the two dots...
> 
> If I encode those characters as HTML entities than they are not
> interpreted correctly by the reading program because the ampersands
> are escaped with &amp;.
> 
> For now I am using this aproach. It is not quite correct but at least
> is readable without errors...
> 
> Any help very appreciated,
> Marin
> 
>