[OAI-implementers] converting filenames of metadata records

John Weatherley jweather@ucar.edu
Wed, 22 Oct 2003 11:03:11 -0600 (MDT)


for now it is not possible to tell the DLESE OAI software to leave the
colons in file names unencoded rather than converting them to %3A. The
software does this because Windows file systems don't accept colons (and
some other chars) as valid characters, and the software is designed to be
cross-platform compatible. 

A number of people have reported having this problem, however, so I may
change the way file names are encoded in future releases of the software
to make them easier to work with (suggestions anyone?).

That being said, I have had success opening and reading files that are
encoded this way using the dom4j XML APIs (available at
http://www.dom4j.org/). Sample code:

import org.dom4j.Document;
import org.dom4j.DocumentException;
import org.dom4j.io.SAXReader;


File dir = new File("/home/jweather/ ... /dlese.org/oai/provider/avc/nsdl_dc");
File [] files = dir.listFiles();

Document document;

for(int i = 0; i < files.length; i++){
   document = reader.read(files[i]);
   // Process the doc...


Another possibility in your code below: in the builder.build(...)  
method, try passing in a java.io.InputStream, java.io.Reader or
java.net.URL instead of the java.io.File and see if that works.

 - john

On Wed, 22 Oct 2003, Thomas Krämer wrote:

> Hello
> i am developping a middleware, that uses metadata harvested with the 
> DLESE OAI software.
> thus, there is a directory with hundreds of metadata records, that are 
> not sorted nor can queries be formulated in order to retrieve the 
> relevant among them.
> Q1:Am i right assuming that repositories DO NOT offer any search 
> interfaces, but provide their entire metadata and nothing more?
> Q2:Am i right assuming that the DLESE OAI software has the apache lucene 
> search api integrated, but is not yet working?
> however, i am currently trying to use the apache lucene search api to 
> index these records and make them searchable.
> certain problem appears, when i try to read a record :
>      SAXBuilder builder = new SAXBuilder();
>      try {
>        Document doc = builder.build(recordfile.getAbsoluteFile());
>        Element root = doc.getRootElement();
>        listChildren(root, 0);
>      }
> i always get an io.FileNotFoundException, as the oai-pmh changes the 
> host separator  " : " into " %3A "
> the pathname indicated at debugging is the correct one (using the "%3A" 
> , such as the record files on my system)
> but the exception tells me :
> java.io.FileNotFoundException: 
> /home/tom/mwd/metadata/7374617475733D696E7072657373/oai_dc/oai:sammelpunkt.philo.at:103.xml 
> (No such file or directory)
> i am working on a linux system.
> Q3:Is it possible to tell the DLESE OAI Software to save the records on 
> the local system using ":" instead of the hex representation, or, to 
> wrap the records filename in a way that
> admits the java native classes to open the records?
> Thanks a lot for any hint
> Thomas
> _______________________________________________
> OAI-implementers mailing list
> List information, archives, preferences and to unsubscribe:
> http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers

John Weatherley
Software Engineer
DLESE Program Center
University Corporation for Atmospheric Research (UCAR)
Box 3000
Boulder, CO 80307-3000
jweather@ucar.edu (e-mail)   
(303) 497-2680 (tel)
(303) 497-8336 (fax)