Publications Search
Explore how scientists all over the world use DrugBank in their research.
Published on July 1, 2008
READ PUBLICATION →

Detection of IUPAC and IUPAC-like chemical names.

Authors: Klinger R, Kolarik C, Fluck J, Hofmann-Apitius M, Friedrich CM

Abstract: MOTIVATION: Chemical compounds like small signal molecules or other biological active chemical substances are an important entity class in life science publications and patents. Several representations and nomenclatures for chemicals like SMILES, InChI, IUPAC or trivial names exist. Only SMILES and InChI names allow a direct structure search, but in biomedical texts trivial names and Iupac like names are used more frequent. While trivial names can be found with a dictionary-based approach and in such a way mapped to their corresponding structures, it is not possible to enumerate all IUPAC names. In this work, we present a new machine learning approach based on conditional random fields (CRF) to find mentions of IUPAC and IUPAC-like names in scientific text as well as its evaluation and the conversion rate with available name-to-structure tools. RESULTS: We present an IUPAC name recognizer with an F(1) measure of 85.6% on a MEDLINE corpus. The evaluation of different CRF orders and offset conjunction orders demonstrates the importance of these parameters. An evaluation of hand-selected patent sections containing large enumerations and terms with mixed nomenclature shows a good performance on these cases (F(1) measure 81.5%). Remaining recognition problems are to detect correct borders of the typically long terms, especially when occurring in parentheses or enumerations. We demonstrate the scalability of our implementation by providing results from a full MEDLINE run. AVAILABILITY: We plan to publish the corpora, annotation guideline as well as the conditional random field model as a UIMA component.
Published on March 4, 2008
READ PUBLICATION →

A global view of drug-therapy interactions.

Authors: Nacher JC, Schwartz JM

Abstract: BACKGROUND: Network science is already making an impact on the study of complex systems and offers a promising variety of tools to understand their formation and evolution in many disparate fields from technological networks to biological systems. Even though new high-throughput technologies have rapidly been generating large amounts of genomic data, drug design has not followed the same development, and it is still complicated and expensive to develop new single-target drugs. Nevertheless, recent approaches suggest that multi-target drug design combined with a network-dependent approach and large-scale systems-oriented strategies create a promising framework to combat complex multi-genetic disorders like cancer or diabetes. RESULTS: We here investigate the human network corresponding to the interactions between all US approved drugs and human therapies, defined by known relationships between drugs and their therapeutic applications. Our results show that the average paths in this drug-therapy network are shorter than three steps, indicating that distant therapies are separated by a surprisingly low number of chemical compounds. We also identify a sub-network composed by drugs with high centrality measures in the drug-therapy network, which represent the structural backbone of this system and act as hubs routing information between distant parts of the network. CONCLUSION: These findings provide for the first time a global map of the large-scale organization of all known drugs and associated therapies, bringing new insights on possible strategies for future drug development. Special attention should be given to drugs which combine the two properties of (a) having a high centrality value in the drug-therapy network and (b) acting on multiple molecular targets in the human system.
Published in February 2008
READ PUBLICATION →

Quantitative systems-level determinants of human genes targeted by successful drugs.

Authors: Yao L, Rzhetsky A

Abstract: What makes a successful drug target? A target molecule with an appropriate (druggable) tertiary structure is a necessary but not the sufficient condition for success. Here we analyzed specific properties of human genes and proteins targeted by 919 FDA-approved drugs and identified several quantitative measures that distinguish them from other genes and proteins at a highly significant level. Compared to an average gene and its encoded protein(s), successful drug targets are more highly connected (but far from being the most highly connected), have higher betweenness values, lower entropies of tissue expression, and lower ratios of nonsynonymous to synonymous single-nucleotide polymorphisms. Furthermore, we have identified human tissues that are significantly over- or undertargeted relative to the full spectrum of genes that are active in each tissue. Our study provides quantitative guidelines that could aid in the computational screening of new drug targets in human cells.
Published on February 19, 2008
READ PUBLICATION →

PDTD: a web-accessible protein database for drug target identification.

Authors: Gao Z, Li H, Zhang H, Liu X, Kang L, Luo X, Zhu W, Chen K, Wang X, Jiang H

Abstract: BACKGROUND: Target identification is important for modern drug discovery. With the advances in the development of molecular docking, potential binding proteins may be discovered by docking a small molecule to a repository of proteins with three-dimensional (3D) structures. To complete this task, a reverse docking program and a drug target database with 3D structures are necessary. To this end, we have developed a web server tool, TarFisDock (Target Fishing Docking) http://www.dddc.ac.cn/tarfisdock, which has been used widely by others. Recently, we have constructed a protein target database, Potential Drug Target Database (PDTD), and have integrated PDTD with TarFisDock. This combination aims to assist target identification and validation. DESCRIPTION: PDTD is a web-accessible protein database for in silico target identification. It currently contains >1100 protein entries with 3D structures presented in the Protein Data Bank. The data are extracted from the literatures and several online databases such as TTD, DrugBank and Thomson Pharma. The database covers diverse information of >830 known or potential drug targets, including protein and active sites structures in both PDB and mol2 formats, related diseases, biological functions as well as associated regulating (signaling) pathways. Each target is categorized by both nosology and biochemical function. PDTD supports keyword search function, such as PDB ID, target name, and disease name. Data set generated by PDTD can be viewed with the plug-in of molecular visualization tools and also can be downloaded freely. Remarkably, PDTD is specially designed for target identification. In conjunction with TarFisDock, PDTD can be used to identify binding proteins for small molecules. The results can be downloaded in the form of mol2 file with the binding pose of the probe compound and a list of potential binding targets according to their ranking scores. CONCLUSION: PDTD serves as a comprehensive and unique repository of drug targets. Integrated with TarFisDock, PDTD is a useful resource to identify binding proteins for active compounds or existing drugs. Its potential applications include in silico drug target identification, virtual screening, and the discovery of the secondary effects of an old drug (i.e. new pharmacological usage) or an existing target (i.e. new pharmacological or toxic relevance), thus it may be a valuable platform for the pharmaceutical researchers. PDTD is available online at http://www.dddc.ac.cn/pdtd/.
Published in January 2008
READ PUBLICATION →

SuperTarget and Matador: resources for exploring drug-target relationships.

Authors: Gunther S, Kuhn M, Dunkel M, Campillos M, Senger C, Petsalaki E, Ahmed J, Urdiales EG, Gewiess A, Jensen LJ, Schneider R, Skoblo R, Russell RB, Bourne PE, Bork P, Preissner R

Abstract: The molecular basis of drug action is often not well understood. This is partly because the very abundant and diverse information generated in the past decades on drugs is hidden in millions of medical articles or textbooks. Therefore, we developed a one-stop data warehouse, SuperTarget that integrates drug-related information about medical indication areas, adverse drug effects, drug metabolization, pathways and Gene Ontology terms of the target proteins. An easy-to-use query interface enables the user to pose complex queries, for example to find drugs that target a certain pathway, interacting drugs that are metabolized by the same cytochrome P450 or drugs that target the same protein but are metabolized by different enzymes. Furthermore, we provide tools for 2D drug screening and sequence comparison of the targets. The database contains more than 2500 target proteins, which are annotated with about 7300 relations to 1500 drugs; the vast majority of entries have pointers to the respective literature source. A subset of these drugs has been annotated with additional binding information and indirect interactions and is available as a separate resource called Matador. SuperTarget and Matador are available at http://insilico.charite.de/supertarget and http://matador.embl.de.
Published in January 2008
READ PUBLICATION →

GLIDA: GPCR--ligand database for chemical genomics drug discovery--database and tools update.

Authors: Okuno Y, Tamon A, Yabuuchi H, Niijima S, Minowa Y, Tonomura K, Kunimoto R, Feng C

Abstract: G-protein coupled receptors (GPCRs) represent one of the most important families of drug targets in pharmaceutical development. GLIDA is a public GPCR-related Chemical Genomics database that is primarily focused on the integration of information between GPCRs and their ligands. It provides interaction data between GPCRs and their ligands, along with chemical information on the ligands, as well as biological information regarding GPCRs. These data are connected with each other in a relational database, allowing users in the field of Chemical Genomics research to easily retrieve such information from either biological or chemical starting points. GLIDA includes a variety of similarity search functions for the GPCRs and for their ligands. Thus, GLIDA can provide correlation maps linking the searched homologous GPCRs (or ligands) with their ligands (or GPCRs). By analyzing the correlation patterns between GPCRs and ligands, we can gain more detailed knowledge about their conserved molecular recognition patterns and improve drug design efforts by focusing on inferred candidates for GPCR-specific drugs. This article provides a summary of the GLIDA database and user facilities, and describes recent improvements to database design, data contents, ligand classification programs, similarity search options and graphical interfaces. GLIDA is publicly available at http://pharminfo.pharm.kyoto-u.ac.jp/services/glida/. We hope that it will prove very useful for Chemical Genomics research and GPCR-related drug discovery.
Published in January 2008
READ PUBLICATION →

PROCOGNATE: a cognate ligand domain mapping for enzymes.

Authors: Bashton M, Nobeli I, Thornton JM

Abstract: PROCOGNATE is a database of protein cognate ligands for the domains in enzyme structures as described by CATH, SCOP and Pfam, and is available as an interactive website or a flat file. This article gives an overview of the database and its generation and presents a new website front end, as well as recent increased coverage in our dataset via inclusion of Pfam domains. We also describe navigation of the website and its features. The current version (1.3) of PROCOGNATE covers 4123, 4536, 5876 structures and 377, 326, 695 superfamilies/families in CATH, SCOP and Pfam, respectively. PROCOGNATE can be accessed at: http://www.ebi.ac.uk/thornton-srv/databases/procognate/
Published in January 2008
READ PUBLICATION →

ChemBank: a small-molecule screening and cheminformatics resource database.

Authors: Seiler KP, George GA, Happ MP, Bodycombe NE, Carrinski HA, Norton S, Brudz S, Sullivan JP, Muhlich J, Serrano M, Ferraiolo P, Tolliday NJ, Schreiber SL, Clemons PA

Abstract: ChemBank (http://chembank.broad.harvard.edu/) is a public, web-based informatics environment developed through a collaboration between the Chemical Biology Program and Platform at the Broad Institute of Harvard and MIT. This knowledge environment includes freely available data derived from small molecules and small-molecule screens and resources for studying these data. ChemBank is unique among small-molecule databases in its dedication to the storage of raw screening data, its rigorous definition of screening experiments in terms of statistical hypothesis testing, and its metadata-based organization of screening experiments into projects involving collections of related assays. ChemBank stores an increasingly varied set of measurements derived from cells and other biological assay systems treated with small molecules. Analysis tools are available and are continuously being developed that allow the relationships between small molecules, cell measurements, and cell states to be studied. Currently, ChemBank stores information on hundreds of thousands of small molecules and hundreds of biomedically relevant assays that have been performed at the Broad Institute by collaborators from the worldwide research community. The goal of ChemBank is to provide life scientists unfettered access to biomedically relevant data and tools heretofore available primarily in the private sector.
Published on January 23, 2008
READ PUBLICATION →

Gene characterization index: assessing the depth of gene annotation.

Authors: Kemmer D, Podowski RM, Yusuf D, Brumm J, Cheung W, Wahlestedt C, Lenhard B, Wasserman WW

Abstract: BACKGROUND: We introduce the Gene Characterization Index, a bioinformatics method for scoring the extent to which a protein-encoding gene is functionally described. Inherently a reflection of human perception, the Gene Characterization Index is applied for assessing the characterization status of individual genes, thus serving the advancement of both genome annotation and applied genomics research by rapid and unbiased identification of groups of uncharacterized genes for diverse applications such as directed functional studies and delineation of novel drug targets. METHODOLOGY/PRINCIPAL FINDINGS: The scoring procedure is based on a global survey of researchers, who assigned characterization scores from 1 (poor) to 10 (extensive) for a sample of genes based on major online resources. By evaluating the survey as training data, we developed a bioinformatics procedure to assign gene characterization scores to all genes in the human genome. We analyzed snapshots of functional genome annotation over a period of 6 years to assess temporal changes reflected by the increase of the average Gene Characterization Index. Applying the Gene Characterization Index to genes within pharmaceutically relevant classes, we confirmed known drug targets as high-scoring genes and revealed potentially interesting novel targets with low characterization indexes. Removing known drug targets and genes linked to sequence-related patent filings from the entirety of indexed genes, we identified sets of low-scoring genes particularly suited for further experimental investigation. CONCLUSIONS/SIGNIFICANCE: The Gene Characterization Index is intended to serve as a tool to the scientific community and granting agencies for focusing resources and efforts on unexplored areas of the genome. The Gene Characterization Index is available from http://cisreg.ca/gci/.
Published in December 2007
READ PUBLICATION →

Computational models to assign biopharmaceutics drug disposition classification from molecular structure.

Authors: Khandelwal A, Bahadduri PM, Chang C, Polli JE, Swaan PW, Ekins S

Abstract: PURPOSE: We applied in silico methods to automatically classify drugs according to the Biopharmaceutics Drug Disposition Classification System (BDDCS). MATERIALS AND METHODS: Models were developed using machine learning methods including recursive partitioning (RP), random forest (RF) and support vector machine (SVM) algorithms with ChemDraw, clogP, polar surface area, VolSurf and MolConnZ descriptors. The dataset consisted of 165 training and 56 test set molecules. RESULTS: RF model 3, RP model 1, and SVM model 1 can correctly predict 73.1, 63.6 and 78.6% test compounds in classes 1, 2 and 3, respectively. Both RP and SVM models can be used for class 4 prediction. The inclusion of consensus analysis resulted in improved test set predictions for class 2 and 4 drugs. CONCLUSIONS: The models can be used to predict BDDCS class for new compounds from molecular structure using readily available molecular descriptors and software, representing an area where in silico approaches could aid the pharmaceutical industry in speeding drugs to the patient and reducing costs. This could have significant applications in drug discovery to identify molecules that may have future developability issues.