[Orechem] Re: High-throughput semantic computation in OREChem

Mon Jun 15 09:16:43 EDT 2009

Dear Dr Williams;

Thank you for your expert input; I must confess
to only looking at a few CASE examples from google scholar. We
do
 however reference your recent work in Journal of Cheminformatics,  in our
paper on information extraction from
documents.

There was an excellent presentation given by Jiawei Han at PSU last Friday, I
foresee great application of his algorithms 
to CASE. Your expert advice would also be
greatly valued, I will indeed be in touch and cc Peter et al, there
 are a
number of folks here at PSU who will no doubt be interested as well.

many thanks again;

-bill brouwer
(PSU
Chemistry)

On Mon, Jun 15, 2009 08:25 AM, Antony Williams
<WilliamsA at rsc.org> wrote:
>
>
Dr Brouwer,
>
>Things have come a long way since the 1996 article you reference. For a modern
>review regarding automated structure elucidation I refer you to:
>
> M. E. Elyashberg, A. J. Williams, and G. E. Martin. Computer-Assisted
>Structure Verification and Elucidation Tools In NMR-Based Structure
>Elucidation. Review article. Progress in NMR Spectroscopy (2007)
>10.1016/j.pnmrs.2007.04.003
>
>As one of the authors of this work feel free to contact me at the email address
>below and I can provide some guidance regarding where automated structure
>elucidation is today. There are commercial products on the market today and
>Christoph Steinbeck from EBI has his own work going on in this area. Peter
>knows him well I believe. Chris also has Open SOurce NMR prediction tools
>online at NMRSHiftDB that you might be interested in using.
>
>As the product manager for a CASE system (Computer Assisted Structure
>Elucidation) for many years I can say that the challenge is not easy but
>that there has been a lot of progress made over the years. SOme other papers
>that might interest you (excuse the numbers...snipped out of my CV).
>
>On ChemSpider we will be integrating Chris Steinbeck's NMR shift prediction
>technology in the near future we hope. Best wishes.
>
>44. G.E. Martin, C.E. Hadden, D.J. Russell, B.D. Kaluzny, J.E. Guido, W.K.
>Duholke, B.A. Stiemsma, T.J. Thamann, R.C. Crouch, K.A. Blinov, M.E.
>Elyashberg, E.R. Martirosian, S.G. Molodtsov, A.J. Williams, P.L. Schiff, Jr.,
>Identification of Degradants of a Complex Alkaloid Using NMR Cryoprobe
>Technology and ACD/Structure Elucidator, J. Heterocyclic Chem. 39, 1241
>(2002)
>
>45. M.E. Elyashberg, K.A. Blinov, A.J. Williams, E.R. Martirosian, S.G.
>Molodtsov, Application of a New Expert System for the Structure Elucidation of
>Natural Products from the 1D and 2D NMR Data, J. Nat. Prod., 65, 693
>(2002)
>
>50. K. A. Blinov, D. Carlson, M. E. Elyashberg, G. E. Martin, E. R.
>Martirosian, S. Molodtsov, and A. J. Williams, Computer-Assisted Structure
>Elucidation of Natural Products with Limited 2D NMR Data: Applications of the
>StrucEluc System, Magn. Reson. Chem., 41, 359-372 (2003).
>
>51. G. E. Martin, D. J. Russell, K. A. Blinov, M. E. Elyashberg and A. J.
>Williams, Applications and Advances in Cryogenic NMR Probes &
>Computer-Assisted Structure Elucidation. Ann. Magn. Reson., 2, 1-31
>(2003)
>52. A.J. Williams. Recent Advances in NMR Prediction and Automated Structure
>Elucidation Software. Current Opinion in Drug Discovery & Development, 3,
>298 (2003)
>
>53. K. Blinov, M. Elyashberg, E. R. Martirosian, S. G. Molodtsov, A. J.
>Williams, M. H. M. Sharaf, P. L. Schiff, Jr., R. C. Crouch, G. E. Martin, C. E.
>Hadden, and J. E. Guido, “Quindolinocryptotackieine: The Elucidation of a
>Novel Indoloquinoline Alkaloid Structure through the Use of Computer-Assisted
>Structure Elucidation and 2D-NMR,” Magn. Reson. Chem., 41, 577-584
>(2003).
>
>54. M. E. Elyashberg, K. A. Blinov, E. R. Martirosian, S. G. Molodtsov, A. J.
>Williams, and G. E. Martin, Automated Structure Elucidation – The Benefits of
>a Symbiotic Relationship between the Spectroscopist and the Expert System, J.
>Heterocyclic Chem., 40, 1017-1029 (2003).
>
>2004
>55. M. E. Elyashberg, K. A. Blinov, A. J. Williams, S. G. Molodtsov, G. E.
>Martin, and E. R. Martirosian, Structure Elucidator: A Versatile Expert System
>for Molecular Structure Elucidation from 1D and 2D NMR Data and Molecular
>Fragments, J. Chem. Inf. Comput. Sci. 44, 771-792 (2004).
>
>57. S. G. Molodtsov, M. E. Elyashberg, K. A. Blinov, A. J. Williams, E. E.
>Martirosian, G. E. Martin, and B. Lefebvre. Structure Elucidation from 2D NMR
>Spectra Using the StrucEluc Expert System: Detection and Removal of
>Contradictions in the Data. J. Chem. Inf. Comp. Sci., 44, 1737-1751
>(2004)
>
>58. G. J. Sharman, I. C. Jones, M. P. Parnell, M. C. Willis, M. F. Mahon, D. V.
>Carlson, A. J. Williams, M. E. Elyashberg, K. A. Blinov, S. G. Molodtsov.
>Automated structure elucidation of two products in a reaction of an 
>-unsaturated pyruvate. Magn. Reson. Chem. 42, 567 (2004)
>2005
>60. Y. D. Smurnyy, M. E. Elyashberg, K. A. Blinov, B. A. Lefebvre, G. E.
>Martin, and A. J. Williams, Computer-Aided Determination of Relative
>Stereochemistry and 3D Models of Complex Organic Molecules from 2D NMR Spectra,
>Tetrahedron, 61, 9980-9989 (2005).
>63. S. S. Golotvin, E. Vodopianov, B. A. Lefebvre, A. J. Williams, and T. D.
>Spitzer. Automated structure verification based on 1H NMR prediction. Magn.
>Reson. Chem., 44, 524 (2006)
>66. M. E. Elyashberg, K. A. Blinov, A. J. Williams, S. G. Molodtsov, and G. E.
>Martin, Are Deterministic Expert Systems for Computer-Assisted Structure
>Elucidation Obsolete? J. Chem. Inf. Model. 46, 1643-1656 (2006).
>70. M. E. Elyashberg, K. A. Blinov, S. G. Molodtsov, A. J. Williams, and G. E.
>Martin, Fuzzy Structure Generation: An Efficient New Tool for Computer-Aided
>Structure Elucidation (CASE), J. Chem. Inf. Model., 47, 1053-1066
>(2007). 10.1021/ci600528g
>76. S. S. Golotvin, E. Vodopianov, R. Pol, B. A. Lefebvre, A. J. Williams, R.
>D. Rutkowske and T. D. Spitzer, Automated structure verification based on a
>combination of 1D 1H NMR and 2D 1H–13C HSQC spectra, Magn. Reson. Chem. 2007,
>45, 803–813
>
>90. M. E. Elyashberg, A. J. Williams, D. C. Lankin, G. E. Martin, J. Porco, W.
>F. Reynolds, and C. Singleton, Applying Computer-Assisted Structure Elucidation
>Algorithms for the Purpose of Structure Validation – Revising the NMR
>Assignments of Hexacyclinol, J. Nat. Prod., 71, 581-588 (2008).
>Antony Williams, VP Strategic Development
>ChemSpider, Royal Society of Chemistry
>
>US Office: 904 Tamaras Circle, Wake Forest, NC-27587
>
>Phone: +1 (919) 201-1516
>Fax: +1 (919) 300-5321
>Email: antony.williams at chemspider.com
>
>URL: www.chemspider.com
>Blog: http://www.chemspider.com/blog
>Twitter: http://twitter.com/ChemSpiderman
>Skype: tony27587
>LinkedIn: http://www.linkedin.com/in/antonywilliams
>
>________________________________
>From: orechem-bounces at openarchives.org [orechem-bounces at openarchives.org] On
>Behalf Of WILLIAM J BROUWER [wjb19 at psu.edu]
>Sent: Friday, June 12, 2009 2:40 PM
>To: Peter Murray-Rust
>Cc: orechem at openarchives.org; Geoffrey Fox; Nick Day
>Subject: [Orechem] Re: High-throughput semantic computation in OREChem
>
>no worries Peter, understood. If anybody needs anything from me beyond what's
>in our paper & at the wiki already ->
>http://services.nsdl.org/trac/oreChem/wiki/ChemistryDataExtraction, do let me
>know...
>
>I'm going to start taking a look at the AI/search possibilities of our fitted
>spectra database then -> http://pubs.acs.org/doi/abs/10.1021/ci950092p
>
>cheers,
>bill
>
>On Fri, Jun 12, 2009 11:14 AM, Peter Murray-Rust <pm286 at cam.ac.uk> wrote:
>
>Great - that's exciting Bill and I am sure that it will be invaluable for
>assignment. However I am focusssing on what we can integrate today. The
>integration problems are not trivial and the more that the components - or the
>sites - are modularised the faster progress we shall
>
>It's important to be pragmatic at this stage - there are things we can do now
>and things that are research. We should do both but we must make sure that the
>infrastructure continues in a straight line. I detailed what we could do at
>present (some as rough proof of concept) that could fit into a linear
>workflow. We must make sure that the research efforts in the pipeline I
>indicated are small as the integration of itself will still be challenging.
>
>So I am propopsing that we should ask:
>* what can we do by Friday 19?
>* what can we do by the start of August?
>* what can we do in the rest of the project.
>
>Each part depends on the previous one:
>* Mark needs a few papers from Lee/Prasenjit which have good PDF chemistry
>* PMR needs a few molecules and spectra in SVG
>* Marlon needs a few CML molecules and the NMREye workflow.
>
>I agree that Mark's work on general PDF parsing is exciting but we need a
>stream of molecules for the later stages.
>
>I am also going to suggest that we try to arrange weekly telcons to review
>progress. The problem of a pipeline/workflow is that all bits have to be
>delivering.
>
>P.
>
>
>
>
>On Fri, Jun 12, 2009 at 3:47 PM, WILLIAM J BROUWER <wjb19 at psu.edu> wrote:
>cool peter...
>
>I would also add that there's some mileage in substructure & similarity
>search on spectra. Han gave a great talk this morning, there is strong
>application of his graph mining work to building up complicated spectra on the
>basis of simpler (sub)spectra...
>
>-bill
>
>
>On Fri, Jun 12, 2009 10:31 AM, Peter Murray-Rust <pm286 at cam.ac.uk> wrote:
>
>This is to review the subprojects that the computational geeks in OREChem have
>put together over the last few days. (a) is long term, (b)
>is immediate
>(a) The general goal is to compute NMR spectra for all new published
>compounds and compare them with spectra. This is a new approach "robot
>refereeing of chemistry publications" and any differences suggest errors
>or new chemistry. This is long term (months) and consists of the
>following (as we have put on the wiki):
>* PSU-Lee/Prasenjit retrieve chemistry-rich docs from publisher sites (ask
>for forgiveness policy) and segment the papers into text+non-text
>(tables, diagrams). This passes to:
>* Mark - Soton extracts molecules and spectra out of this and converts them to
>SVG. The short-term goal is to get this working by the end of next week in a
>pragmatic form. (we do not mind if recall is poor as long as we get a few
>SVGs as we need to develop the machine-learning and/or heuristics and find out
>what unknown horrors we have to deal with.
>Bitmaps are rejected at this stage
>* PMR- cambridge develops heuristics to interpret (i) molecules
>(ii) spectra (C13 and H1). These might later be
>crowdsourced. The output is CML molecules and spectra. It is unlikely we have
>assignments
>* PSU - Bill+Karl. Analyse spectra with peak-fitting.
>* IU - Marlon. (independently) molecules are passed to IU in CML and
>put into the NMREye workflow for computing peaks (below). IU run this
>automatically and return results in CML
>
>(b) To get IU up to speed we shall start immediately on simple
>molecules from Pubchem. This involves just Cambridge and IU.
>* The NMREye workflow has been developed and tested and should work on simple
>organic compounds. It consists of the following:
>  - convert PubchemXML2CML (already available in JUMBO)
>  - convert CML to Gaussian input. We have an XSLT script, but could convert
>this to Java in an hour.
>  - in parallel - create RDF metadata for provenance to this point (as
>this does not survive the Gaussian run)
>  ... submit and run job ... (IU) ... and collect results
> - convert LOG file to CML (JUMBOMarker, effectively done)
> - convert CML to RDF (JUMBO). Add GaussianOWL dictionary in RDF
>
>upload RDFs into reopository/tripleStore
>
>In (b) we would expect to get 10,000 - 100,000 small molecules from
>Pubchem of up to, say , 15 first row atoms. These already have 3D coordinates
>(I am ignoring conformers at this stage). The process should be
>automatic. Jobs take from 0.1 seconds to 1 day (probably) as they
>scale with N^4.
>
>P.
>
>I will try to send this to the Wiki
>
>
>--
>Peter Murray-Rust
>Reader in Molecular Informatics
>Unilever Centre, Dep. Of Chemistry
>University of Cambridge
>CB2 1EW, UK
>+44-1223-763069
>
>
>
>
>
>
>--
>Peter Murray-Rust
>Reader in Molecular Informatics
>Unilever Centre, Dep. Of Chemistry
>University of Cambridge
>CB2 1EW, UK
>+44-1223-763069
>
>
>
>
>DISCLAIMER:
>
>This communication (including any attachments) is intended for the
>use of the addressee only and may contain confidential, privileged or copyright
>material. It may not be relied upon or disclosed to any other person without
>the consent of the RSC. If you have received it in error, please contact us
>immediately. Any advice given by the RSC has been carefully formulated but is
>necessarily based on the information available, and the RSC cannot be held
>responsible for accuracy or completeness. In this respect, the RSC owes no duty
>of care and shall not be liable for any resulting damage or loss. The RSC
>acknowledges that a disclaimer cannot restrict liability at law for personal
>injury or death arising through a finding of negligence. The RSC does not
>warrant that its emails or attachments are Virus-free: Please rely on your own
>screening.
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.openarchives.org/pipermail/orechem/attachments/20090615/79c3df47/attachment-0001.htm