[OAI-implementers] Perl (5.005), utf-8 and special characters: is this a FAQ?

Marin Balgarensky marinb@gmx.net
Fri, 27 Dec 2002 04:25:26 +0100 (MET)


Thanks Simeon,

> XML::Writer just escapes the characters that are special in XML
> (that is & < > "). Everything else is expected to in the appropriate
> character encoding already.

This is my first experience with the writer, but it seems to do its
work very well...

>  
> > An invalid XML character (Unicode: 0x1b2ea5) was found in the element
> > content of the document.
> 
> I have no idea how you might get character 0x1b2ea5 -- this is not a 
> valid Unicode character. &ouml; should be 0xf6

Neither me. And you are right, it's 0xf6.

> (from: http://www.unicode.org/Public/UNIDATA/UnicodeData.txt)
> 00F6;LATIN SMALL LETTER O WITH DIAERESIS
> 
> My guess is that you are not correctly producing UTF-8 from whatever
> source documents you have (which like use some other character encoding).

Correct. I was to naive, thinking that perl or the xml::writer will do
the conversion for me... I am now using the Unicode::String module
and it seems to work as expected...

Thanks again,
Marin

-- 
+++ GMX - Mail, Messaging & more  http://www.gmx.net +++
NEU: Mit GMX ins Internet. Rund um die Uhr für 1 ct/ Min. surfen!