[OAI-implementers] XML encoding problems with DSpace at MIT

David Woodward dwoo@loc.gov
Wed, 19 Feb 2003 09:40:28 -0500


The 1.1 provider is back up and running at
http://memory.loc.gov/cgi-bin/oai1_1  (and cgi-bin/oai for that
matter). Sorry for any inconvenience. The 2.0 version
(http://memory.loc.gov/cgi-bin/oai2_0) does supercede it, but we have
not (except by accident) disabled support for the 1.1 repository.

Dave


>>> Tim Brody <tim@tim.brody.btinternet.co.uk> 02/18/03 10:30AM >>>
Celestial keeps a record of errors that occurred during harvesting:
http://celestial.eprints.org/cgi-bin/status 

I reset the errors occasionally to save space.

The mods format appears to be AWOL:
http://celestial.eprints.org/cgi-bin/status?action=repository;metadataFormat=66


The OAI 1.1 memory.loc.gov interface is returning internal server 
errors, has this interface been removed (lcoa1 supercede it?)?

How to determine what character encoding a PDF is in probably depends
on 
your PDF tool (unless you fancy writing a PDF parser :-)
Reading the PDF spec:
http://partners.adobe.com/asn/developer/acrosdk/docs/pdfspec.pdf 

The default encoding is ISOLatin1, otherwise quoting the doc:
"If text is encoded in Unicode the first two bytes of the text must be

the Unicode Byte Order marker, <FE FF>."

I guess that if a Text object in PDF is in Unicode it uses UTF-16. I've

not done anything with PDF metadata to know for certain.

All the best,
Tim.

Caroline Arms wrote:

> As a data provider, LC would like to know if it is generating
invalid
> characters.  The gradual migration to UNICODE is going to give us
all
> problems, in part BECAUSE some systems work so hard to recognize
different
> character encodings and adjust.  I'm with Hussein.  Notify data
providers
> of problems (even if you do adjust) so that the problem can be fixed
as
> close to home as possible.
> 
> As a related aside, if anyone has a suggestion for an efficient way
> (preferably unix-based) to check that the metadata in a PDF file is
stored
> in UTF-8 encoding (or consistently in any other UNICODE encoding),
I'd be
> interested.  
> 
> Caroline Arms
> Office of Strategic Initiatives
> Library of Congress

_______________________________________________
OAI-implementers mailing list
OAI-implementers@oaisrv.nsdl.cornell.edu 
http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers