[Orechem] Some new things to look at - please read completely
Carl Lagoze
clagoze at gmail.com
Wed Oct 28 10:22:26 EDT 2009
Hi all,
At our recent meeting in Pittsburgh is a few action items to be
completed by last Friday. I missed that deadline, my apologies, but
here they are.
First, I have put two new pages up on the wiki accessible from the
front page. One is to list presentations relevant to the project and
the other to list papers. I have also created corresponding places in
our SVN repository for these items. This repository is located at http://services.nsdl.org/svn/oreChem/
and now contains three directories; meeting materials,
presentations, and papers. This repository should be read write
accessible by all of you. After storing your materials here you can
then link to them from the appropriate wiki pages. I have uploaded a
number of presentations and papers from recent meetings, workshops,
and conferences. For example, in the papers wiki page I have linked to
papers submitted to the recent Web science and Microsoft eScience
conferences, and a link to copies of the eChemistry workshop white
paper and upcoming nature chemistry article on that white paper.
Please note, that copies of papers stored on these internal project
pages are not public and should be distributed only with the members
of the project.
Second, I had promised to do some design work on our information
transfer format. This work is now written up at http://services.nsdl.org/trac/oreChem/wiki/DocumentOntology#Atomizingdocumentontologyinstances
. Briefly summarizing, I have put together an OAI ore-based atom
encoding for the document information that we extract from documents
and then pass along to subsequent partners. It is based on the
workflow that we discussed at our meeting whereby Penn State analyzes
a raw document, extracts bibliographic and other data, encodes that as
an OAI ORE aggregation and resource map and then makes it available
via atom to Southhampton who extracts additional information and
enhances the aggregation resource map, passes out on to Cambridge,
etc. Following the suggestion of Jim Downing, I believe this all
should be done using the atom feed paging and archiving technique
outlined in RFC 5005 (http://www.ietf.org/rfc/rfc5005.txt). This
workflow is illustrated in slide 17 of the slides accessible at ms
escience 2009.pptx.
It's critically important that we converge on this (or some other)
transfer format and resolve the few open issues that I list as soon as
possible. Therefore, I would like each group to assign one person to
look at the format, evaluate whether it is sensible and whether it
fulfills our needs, and determined whether they can implement it so we
can get experiments running as soon as possible (e.g. before our
December meeting). I have put up a doodle page at http://www.doodle.com/a9w2r3u3xw8zah5d
to schedule a meeting amongst the people who will do this
evaluation, I will of course be on that phone call. Again, this phone
call is only for the individuals in each group will be doing this
evaluation and possibly the development of the code to implement it.
I'd like to emphasize again that it is very important that we move
forward with this and implement it as soon as possible so we can move
on from our data collection and extraction process to application
development upon the triples that we will store.
Finally, I'd like to remind you of our next all group meeting in
London on December 5 and sixth. I will be getting back to you in the
near future with logistics and will schedule a phone call a week or
two before that meeting amongst all of us.
I still need to get the notes up from the previous meeting, but
considered this stuff first priority.
Thanks, Carl
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.openarchives.org/pipermail/orechem/attachments/20091028/63b24e6c/attachment.htm
More information about the Orechem
mailing list