[OAI-implementers] valid character encoding

Simeon Warner simeon@cs.cornell.edu
Wed, 13 Aug 2003 10:47:48 -0400 (EDT)


On Wed, 13 Aug 2003, Todd White wrote:
> is there a limited number of valid character encodings for a valid OAI
> repository?

You must use UTF-8, see:
http://www.openarchives.org/OAI/2.0/openarchivesprotocol.htm#XMLResponse
 
> the encoding i am using is "ISO-8859-1"  this is to support some special
> characters in our metadata that were not supported by UTF-8.

I believe all of ISO-8859-1 (Latin 1) is supported in Unicode with code
positions unchanged. The bytes will, of course, be different in a UTF-8
encoded stream.

Note that Microsoft's CP1252 uses codes 0x80--0xBF which aren't in Latin 1
and do require translation to different Unicode code positions, see:
ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1252.TXT

Cheers,
Simeon
 
> when i tested our newly developed OAI respository software using the
> web-based Open Archives Initiative - Repository Explorer
> (http://oai.dlib.vt.edu/cgi-bin/Explorer/oai2.0/testoai) it told me...
> 
>   XML Schema Validation Error !
>   Illegal character encoding in XML
> 
> here's the URL to our repository:
>   http://michiganteacher.net/oai
> 
> _______________________________________________
> OAI-implementers mailing list
> List information, archives, preferences and to unsubscribe:
> http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers