[Orechem] Some new things to look at - please read completely

Carl Lagoze clagoze at gmail.com
Wed Oct 28 10:22:26 EDT 2009


Hi all,

At our recent meeting in Pittsburgh is a few action items to be  
completed by last Friday. I missed that deadline, my apologies, but  
here they are.

First, I have put two new pages up on the wiki accessible from the  
front page. One is to list presentations relevant to the project and  
the other to list papers. I have also created corresponding places in  
our SVN repository for these items. This repository is located at http://services.nsdl.org/svn/oreChem/ 
  and now contains three directories; meeting materials,  
presentations, and papers. This repository should be read write  
accessible by all of you. After storing your materials here you can  
then link to them from the appropriate wiki pages. I have uploaded a  
number of presentations and papers from recent meetings, workshops,  
and conferences. For example, in the papers wiki page I have linked to  
papers submitted to the recent Web science and Microsoft eScience  
conferences, and a link to copies of the eChemistry workshop white  
paper and upcoming nature chemistry article on that white paper.  
Please note, that copies of papers stored on these internal project  
pages are not public and should be distributed only with the members  
of the project.

Second, I had promised to do some design work on our information  
transfer format. This work is now written up at http://services.nsdl.org/trac/oreChem/wiki/DocumentOntology#Atomizingdocumentontologyinstances 
.   Briefly summarizing, I have put together an OAI ore-based atom  
encoding for the document information that we extract from documents  
and then pass along to subsequent partners. It is based on the  
workflow that we discussed at our meeting whereby Penn State analyzes  
a raw document, extracts bibliographic and other data, encodes that as  
an OAI ORE aggregation and resource map and then makes it available  
via atom to Southhampton who extracts additional information and  
enhances the aggregation resource map, passes out on to Cambridge,  
etc. Following the suggestion of Jim Downing, I believe this all  
should be done using the atom feed paging and archiving technique  
outlined in RFC 5005 (http://www.ietf.org/rfc/rfc5005.txt). This  
workflow is illustrated in slide 17 of the slides accessible at ms  
escience 2009.pptx.

It's critically important that we converge on this (or some other)  
transfer format and resolve the few open issues that I list as soon as  
possible. Therefore, I would like each group to assign one person to  
look at the format, evaluate whether it is sensible and whether it  
fulfills our needs, and determined whether they can implement it so we  
can get experiments running as soon as possible (e.g. before our  
December meeting). I have put up a doodle page at http://www.doodle.com/a9w2r3u3xw8zah5d 
  to schedule a meeting amongst the people who will do this  
evaluation, I will of course be on that phone call. Again, this phone  
call is only for the individuals in each group will be doing this  
evaluation and possibly the development of the code to implement it.

I'd like to emphasize again that it is very important that we move  
forward with this and implement it as soon as possible so we can move  
on from our data collection and extraction process to application  
development upon the triples that we will store.

  Finally, I'd like to remind you of our next all group meeting in  
London on December 5 and sixth. I will be getting back to you in the  
near future with logistics and will schedule a phone call a week or  
two before that meeting amongst all of us.

I still need to get the notes up from the previous meeting, but  
considered this stuff first priority.

Thanks, Carl
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.openarchives.org/pipermail/orechem/attachments/20091028/63b24e6c/attachment.htm


More information about the Orechem mailing list