Publications Search
Explore how scientists all over the world use DrugBank in their research.
Published on March 27, 2007

Seven Golden Rules for heuristic filtering of molecular formulas obtained by accurate mass spectrometry.

Authors: Kind T, Fiehn O

Abstract: BACKGROUND: Structure elucidation of unknown small molecules by mass spectrometry is a challenge despite advances in instrumentation. The first crucial step is to obtain correct elemental compositions. In order to automatically constrain the thousands of possible candidate structures, rules need to be developed to select the most likely and chemically correct molecular formulas. RESULTS: An algorithm for filtering molecular formulas is derived from seven heuristic rules: (1) restrictions for the number of elements, (2) LEWIS and SENIOR chemical rules, (3) isotopic patterns, (4) hydrogen/carbon ratios, (5) element ratio of nitrogen, oxygen, phosphor, and sulphur versus carbon, (6) element ratio probabilities and (7) presence of trimethylsilylated compounds. Formulas are ranked according to their isotopic patterns and subsequently constrained by presence in public chemical databases. The seven rules were developed on 68,237 existing molecular formulas and were validated in four experiments. First, 432,968 formulas covering five million PubChem database entries were checked for consistency. Only 0.6% of these compounds did not pass all rules. Next, the rules were shown to effectively reducing the complement all eight billion theoretically possible C, H, N, S, O, P-formulas up to 2000 Da to only 623 million most probable elemental compositions. Thirdly 6,000 pharmaceutical, toxic and natural compounds were selected from DrugBank, TSCA and DNP databases. The correct formulas were retrieved as top hit at 80-99% probability when assuming data acquisition with complete resolution of unique compounds and 5% absolute isotope ratio deviation and 3 ppm mass accuracy. Last, some exemplary compounds were analyzed by Fourier transform ion cyclotron resonance mass spectrometry and by gas chromatography-time of flight mass spectrometry. In each case, the correct formula was ranked as top hit when combining the seven rules with database queries. CONCLUSION: The seven rules enable an automatic exclusion of molecular formulas which are either wrong or which contain unlikely high or low number of elements. The correct molecular formula is assigned with a probability of 98% if the formula exists in a compound database. For truly novel compounds that are not present in databases, the correct formula is found in the first three hits with a probability of 65-81%. Corresponding software and supplemental data are available for downloads from the authors' website.
Published in January 2007

HMDB: the Human Metabolome Database.

Authors: Wishart DS, Tzur D, Knox C, Eisner R, Guo AC, Young N, Cheng D, Jewell K, Arndt D, Sawhney S, Fung C, Nikolai L, Lewis M, Coutouly MA, Forsythe I, Tang P, Shrivastava S, Jeroncic K, Stothard P, Amegbey G, Block D, Hau DD, Wagner J, Miniaci J, Clements M, Gebremedhin M, Guo N, Zhang Y, Duggan GE, Macinnis GD, Weljie AM, Dowlatabadi R, Bamforth F, Clive D, Greiner R, Li L, Marrie T, Sykes BD, Vogel HJ, Querengesser L

Abstract: The Human Metabolome Database (HMDB) is currently the most complete and comprehensive curated collection of human metabolite and human metabolism data in the world. It contains records for more than 2180 endogenous metabolites with information gathered from thousands of books, journal articles and electronic databases. In addition to its comprehensive literature-derived data, the HMDB also contains an extensive collection of experimental metabolite concentration data compiled from hundreds of mass spectra (MS) and Nuclear Magnetic resonance (NMR) metabolomic analyses performed on urine, blood and cerebrospinal fluid samples. This is further supplemented with thousands of NMR and MS spectra collected on purified, reference metabolites. Each metabolite entry in the HMDB contains an average of 90 separate data fields including a comprehensive compound description, names and synonyms, structural information, physico-chemical data, reference NMR and MS spectra, biofluid concentrations, disease associations, pathway information, enzyme data, gene sequence data, SNP and mutation data as well as extensive links to images, references and other public databases. Extensive searching, relational querying and data browsing tools are also provided. The HMDB is designed to address the broad needs of biochemists, clinical chemists, physicians, medical geneticists, nutritionists and members of the metabolomics community. The HMDB is available at:
Published on January 23, 2007

Pathways and genes differentially expressed in the motor cortex of patients with sporadic amyotrophic lateral sclerosis.

Authors: Lederer CW, Torrisi A, Pantelidou M, Santama N, Cavallaro S

Abstract: BACKGROUND: Amyotrophic lateral sclerosis (ALS) is a fatal disorder caused by the progressive degeneration of motoneurons in brain and spinal cord. Despite identification of disease-linked mutations, the diversity of processes involved and the ambiguity of their relative importance in ALS pathogenesis still represent a major impediment to disease models as a basis for effective therapies. Moreover, the human motor cortex, although critical to ALS pathology and physiologically altered in most forms of the disease, has not been screened systematically for therapeutic targets. RESULTS: By whole-genome expression profiling and stringent significance tests we identify genes and gene groups de-regulated in the motor cortex of patients with sporadic ALS, and interpret the role of individual candidate genes in a framework of differentially expressed pathways. Our findings emphasize the importance of defense responses and cytoskeletal, mitochondrial and proteasomal dysfunction, reflect reduced neuronal maintenance and vesicle trafficking, and implicate impaired ion homeostasis and glycolysis in ALS pathogenesis. Additionally, we compared our dataset with publicly available data for the SALS spinal cord, and show a high correlation of changes linked to the diseased state in the SALS motor cortex. In an analogous comparison with data for the Alzheimer's disease hippocampus we demonstrate a low correlation of global changes and a moderate correlation for changes specifically linked to the SALS diseased state. CONCLUSION: Gene and sample numbers investigated allow pathway- and gene-based analyses by established error-correction methods, drawing a molecular portrait of the ALS motor cortex that faithfully represents many known disease features and uncovers several novel aspects of ALS pathology. Contrary to expectations for a tissue under oxidative stress, nuclear-encoded mitochondrial genes are uniformly down-regulated. Moreover, the down-regulation of mitochondrial and glycolytic genes implies a combined reduction of mitochondrial and cytoplasmic energy supply, with a possible role in the death of ALS motoneurons. Identifying candidate genes exclusively expressed in non-neuronal cells, we also highlight the importance of these cells in disease development in the motor cortex. Notably, some pathways and candidate genes identified by this study are direct or indirect targets of medication already applied to unrelated illnesses and point the way towards the rapid development of effective symptomatic ALS therapies.
Published on November 14, 2006

DrugMetZ DB: an anthology of human drug metabolizing Chytochrome P450 enzymes.

Authors: Antony TR, Nagarajan S

Abstract: UNLABELLED: Understandings the basics of Cytochrome P450 (P450 or CYP) will help to discern drug metabolism. CYP, a super-family of heme-thiolate proteins, are found in almost all living organisms and is involved in the biotransformation of a diverse range of xenobiotics, therapeutic drugs and toxins. Here, we describe DrugMetZ DB, a database for CYP metabolizing drugs. The DB is implemented in MySQL, PHP and HTML. AVAILABILITY:
Published on July 1, 2006

TarFisDock: a web server for identifying drug targets with docking approach.

Authors: Li H, Gao Z, Kang L, Zhang H, Yang K, Yu K, Luo X, Zhu W, Chen K, Shen J, Wang X, Jiang H

Abstract: TarFisDock is a web-based tool for automating the procedure of searching for small molecule-protein interactions over a large repertoire of protein structures. It offers PDTD (potential drug target database), a target database containing 698 protein structures covering 15 therapeutic areas and a reverse ligand-protein docking program. In contrast to conventional ligand-protein docking, reverse ligand-protein docking aims to seek potential protein targets by screening an appropriate protein database. The input file of this web server is the small molecule to be tested, in standard mol2 format; TarFisDock then searches for possible binding proteins for the given small molecule by use of a docking approach. The ligand-protein interaction energy terms of the program DOCK are adopted for ranking the proteins. To test the reliability of the TarFisDock server, we searched the PDTD for putative binding proteins for vitamin E and 4H-tamoxifen. The top 2 and 10% candidates of vitamin E binding proteins identified by TarFisDock respectively cover 30 and 50% of reported targets verified or implicated by experiments; and 30 and 50% of experimentally confirmed targets for 4H-tamoxifen appear amongst the top 2 and 5% of the TarFisDock predicted candidates, respectively. Therefore, TarFisDock may be a useful tool for target identification, mechanism study of old drugs and probes discovered from natural products. TarFisDock and PDTD are available at
Published on June 27, 2006

The Autoimmune Disease Database: a dynamically compiled literature-derived database.

Authors: Karopka T, Fluck J, Mevissen HT, Glass A

Abstract: BACKGROUND: Autoimmune diseases are disorders caused by an immune response directed against the body's own organs, tissues and cells. In practice more than 80 clinically distinct diseases, among them systemic lupus erythematosus and rheumatoid arthritis, are classified as autoimmune diseases. Although their etiology is unclear these diseases share certain similarities at the molecular level i.e. susceptibility regions on the chromosomes or the involvement of common genes. To gain an overview of these related diseases it is not feasible to do a literary review but it requires methods of automated analyses of the more than 500,000 Medline documents related to autoimmune disorders. RESULTS: In this paper we present the first version of the Autoimmune Disease Database which to our knowledge is the first comprehensive literature-based database covering all known or suspected autoimmune diseases. This dynamically compiled database allows researchers to link autoimmune diseases to the candidate genes or proteins through the use of named entity recognition which identifies genes/proteins in the corresponding Medline abstracts. The Autoimmune Disease Database covers 103 autoimmune disease concepts. This list was expanded to include synonyms and spelling variants yielding a list of over 1,200 disease names. The current version of the database provides links to 541,690 abstracts and over 5,000 unique genes/proteins. CONCLUSION: The Autoimmune Disease Database provides the researcher with a tool to navigate potential gene-disease relationships in Medline abstracts in the context of autoimmune diseases.