[OAI-implementers] Unicode table

Simeon Warner simeon@lanl.gov
Mon, 30 Jul 2001 21:37:36 -0600 (MDT)


Though not quite on the same subject, I spent some time today looking into
bad UTF-8/XML exported by OAI data providers. I found the following
tables useful in finding out where the bad characters came from:

Windows 8-bit (codepage 1252) -> Unicode
  ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1252.TXT
Mac Roman -> Unicode
  ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/MAC/ROMAN.TXT
Local advice from Tanmoy Bhattacharya:
  0x0B (illegal in XML) is TeX for \ff, as in effect
  0x7B ( { ) is TeX for \endash, as in Leonard--Jones

Cheers,
Simeon.


On Mon, 30 Jul 2001, herbert van de sompel wrote:
> I kind of like http://www.hclrss.demon.co.uk/unicode/
>
> herbert
>
> Jose Blanco wrote:
> >
> > All:
> >
> > As you know OAI requires that Unicodes be used when needed.  So I have been
> > looking in the web for a comprehensive mapping of character entities to
> > Unicodes, and of the characters themselves to Unicode.  Does any one have
> > any idea where I could find this?
> >
> > Thanks,
> > Jose