RDKit 2009.Q1

Cheminformatics and Machine Learning Software

  Add it to your Download Basket!

 Add it to your Watch List!


Rate it!

What's new in RDKit 2009.Q1:

  • The directory structure of the distribution has been changed in order to make installation of the RDKit python modules more straightforward. Specifically the directory $RDBASE/Python has been renamed to $RDBASE/rdkit and the Python code now expects that $RDBASE is in your PYTHONPATH. When importing RDKit Python modules, one should now do: "from rdkit import Chem" instead of "import Chem". Old code will continue to work if you also add $RDBASE/rdkit to your PYTHONPATH, but it is strongly suggested that you update your scripts to reflect the new organization.
  • For C++ programmers: There is a non-backwards compatible change in the way atoms and bonds are stored on molecules. See the *Other* section for details. Acknowledgements
  • Kirk DeLisle, Noel O'Boyle, Andrew Dalke, Peter Gedeck, Armin Widmer Bug Fixes
  • Incorrect handling of 0s as ring closure digits (issues 2525792, and 2690982)
Read full changelog
send us
an update
BSD License 
Greg Landrum
ROOT \ Science and Engineering \ Chemistry
RDKit is a Python library with data structures, algorithms, and scripts for cheminformatics.

General Molecular Functionality

• Input/Output: SMILES, mol, SDF, TDT
• “Cheminformatics”:
   – Substructure searching with SMARTS
   – Canonical SMILES
   – Chirality support
   – Easy serialization (molecule text)
• 2D depiction (including constrained depiction)
• Generation of 2D -> 3D via distance geometry
• UFF implementation for cleaning up geometries
• Fingerprinting (Daylight-like, “MACCS keys”, etc.)
• Similarity/diversity picking

General Molecular Functionality, cntd

• Subgraph/Fragment analysis
• Gasteiger charges
• Shape-based similarity
• Molecule-molecule alignment
• Molecular transformations (using SMARTS)

General “QSAR” Functionality

• Molecular descriptor library:
   – Topological (κ3, Balaban J, etc.)
   – Electrotopological state (EState)
   – ClogP, MR
   – “MOE like” VSA descriptors
   – others
• Learning:
   – Clustering
   – Decision trees, naïve Bayes*, kNN*    *Functional, but not
                                           a great implementation
   – Bagging, random forests
   – Infrastructure:
       • data splitting
       • shuffling
       • out-of-bag classification
       • serializable models
       • enrichment plots, screening, etc.

Command Line Tools

• ML/BuildComposite.py: build models
• ML/ScreenComposite.py: screen models
• ML/EnrichPlot.py: generate enrichment plot data
• ML/AnalyzeComposite.py: analyze models (descriptor levels)
• Chem/Fingerprints/FingerprintMols.py: generate 2D fingerprints
• Chem/BuildFragmentCatalog.py: CASE-type analysis with a
  hierarchical catalog

Last updated on April 3rd, 2009


#Cheminformatics Learning #Machine Learning #Python library #Cheminformatics #Machine #Learning #Python

Add your review!