[OAI-implementers] valid character encoding
Wed, 13 Aug 2003 11:27:55 -0400 (EDT)
On Wed, 13 Aug 2003, Todd White wrote:
> On Wed, 13 Aug 2003, Thomas G. Habing wrote:
> > The OAI spec mandates that all XML responses must be encoded as UTF-8.
> here's an example of a record that has a special character. i'm not if
> i'm handling it correctly. can anyone confirm?
You have "mus\'ee" in the title and the e acute is not UTF-8 encoded. You
0xE9 0x00E9 #LATIN SMALL LETTER E WITH ACUTE
You might find my little utf8conditioner code helpful for checking
your UTF8 output:
simeon@ice ~>cat oai.xml | ~/src/utf8/utf8conditioner -c
Line 22, char 1181, byte 1181: byte 2 isn't continuation: 0xE9 0x65, restart at 0x65, substituted 0x3F
The correct UTF-8 encoding for character code E9 is the two byte
sequence C3 A9.