Publications Search
Explore how scientists all over the world use DrugBank in their research.
Published in 2016
READ PUBLICATION →

Selectivity profiling of BCRP versus P-gp inhibition: from automated collection of polypharmacology data to multi-label learning.

Authors: Montanari F, Zdrazil B, Digles D, Ecker GF

Abstract: BACKGROUND: The human ATP binding cassette transporters Breast Cancer Resistance Protein (BCRP) and Multidrug Resistance Protein 1 (P-gp) are co-expressed in many tissues and barriers, especially at the blood-brain barrier and at the hepatocyte canalicular membrane. Understanding their interplay in affecting the pharmacokinetics of drugs is of prime interest. In silico tools to predict inhibition and substrate profiles towards BCRP and P-gp might serve as early filters in the drug discovery and development process. However, to build such models, pharmacological data must be collected for both targets, which is a tedious task, often involving manual and poorly reproducible steps. RESULTS: Compounds with inhibitory activity measured against BCRP and/or P-gp were retrieved by combining Open Data and manually curated data from literature using a KNIME workflow. After determination of compound overlap, machine learning approaches were used to establish multi-label classification models for BCRP/P-gp. Different ways of addressing multi-label problems are explored and compared: label-powerset, binary relevance and classifiers chain. Label-powerset revealed important molecular features for selective or polyspecific inhibitory activity. In our dataset, only two descriptors (the numbers of hydrophobic and aromatic atoms) were sufficient to separate selective BCRP inhibitors from selective P-gp inhibitors. Also, dual inhibitors share properties with both groups of selective inhibitors. Binary relevance and classifiers chain allow improving the predictivity of the models. CONCLUSIONS: The KNIME workflow proved a useful tool to merge data from diverse sources. It could be used for building multi-label datasets of any set of pharmacological targets for which there is data available either in the open domain or in-house. By applying various multi-label learning algorithms, important molecular features driving transporter selectivity could be retrieved. Finally, using the dataset with missing annotations, predictive models can be derived in cases where no accurate dense dataset is available (not enough data overlap or no well balanced class distribution).Graphical abstract.
Published in 2016
READ PUBLICATION →

Web-based 3D-visualization of the DrugBank chemical space.

Authors: Awale M, Reymond JL

Abstract: BACKGROUND: Similarly to the periodic table for elements, chemical space offers an organizing principle for representing the diversity of organic molecules, usually in the form of multi-dimensional property spaces that are subjected to dimensionality reduction methods to obtain 3D-spaces or 2D-maps suitable for visual inspection. Unfortunately, tools to look at chemical space on the internet are currently very limited. RESULTS: Herein we present webDrugCS, a web application freely available at www.gdb.unibe.ch to visualize DrugBank (www.drugbank.ca, containing over 6000 investigational and approved drugs) in five different property spaces. WebDrugCS displays 3D-clouds of color-coded grid points representing molecules, whose structural formula is displayed on mouse over with an option to link to the corresponding molecule page at the DrugBank website. The 3D-clouds are obtained by principal component analysis of high dimensional property spaces describing constitution and topology (42D molecular quantum numbers MQN), structural features (34D SMILES fingerprint SMIfp), molecular shape (20D atom pair fingerprint APfp), pharmacophores (55D atom category extended atom pair fingerprint Xfp) and substructures (1024D binary substructure fingerprint Sfp). User defined molecules can be uploaded as SMILES lists and displayed together with DrugBank. In contrast to 2D-maps where many compounds fold onto each other, these 3D-spaces have a comparable resolution to their parent high-dimensional chemical space. CONCLUSION: To the best of our knowledge webDrugCS is the first publicly available web tool for interactive visualization and exploration of the DrugBank chemical space in 3D. WebDrugCS works on computers, tablets and phones, and facilitates the visual exploration of DrugBank to rapidly learn about the structural diversity of small molecule drugs.Graphical abstractwebDrugCS visualization of DrugBank projected in 3D MQN space color-coded by ring count, with pointer showing the drug 5-fluorouracil.
Published in 2016
READ PUBLICATION →

Drug Repositioning for Cancer Therapy Based on Large-Scale Drug-Induced Transcriptional Signatures.

Authors: Lee H, Kang S, Kim W

Abstract: An in silico chemical genomics approach is developed to predict drug repositioning (DR) candidates for three types of cancer: glioblastoma, lung cancer, and breast cancer. It is based on a recent large-scale dataset of ~20,000 drug-induced expression profiles in multiple cancer cell lines, which provides i) a global impact of transcriptional perturbation of both known targets and unknown off-targets, and ii) rich information on drug's mode-of-action. First, the drug-induced expression profile is shown more effective than other information, such as the drug structure or known target, using multiple HTS datasets as unbiased benchmarks. Particularly, the utility of our method was robustly demonstrated in identifying novel DR candidates. Second, we predicted 14 high-scoring DR candidates solely based on expression signatures. Eight of the fourteen drugs showed significant anti-proliferative activity against glioblastoma; i.e., ivermectin, trifluridine, astemizole, amlodipine, maprotiline, apomorphine, mometasone, and nortriptyline. Our DR score strongly correlated with that of cell-based experimental results; the top seven DR candidates were positive, corresponding to an approximately 20-fold enrichment compared with conventional HTS. Despite diverse original indications and known targets, the perturbed pathways of active DR candidates show five distinct patterns that form tight clusters together with one or more known cancer drugs, suggesting common transcriptome-level mechanisms of anti-proliferative activity.
Published in 2016
READ PUBLICATION →

Genome Sequence Variability Predicts Drug Precautions and Withdrawals from the Market.

Authors: Lee KH, Baik SY, Lee SY, Park CH, Park PJ, Kim JH

Abstract: Despite substantial premarket efforts, a significant portion of approved drugs has been withdrawn from the market for safety reasons. The deleterious impact of nonsynonymous substitutions predicted by the SIFT algorithm on structure and function of drug-related proteins was evaluated for 2504 personal genomes. Both withdrawn (n = 154) and precautionary (Beers criteria (n = 90), and US FDA pharmacogenomic biomarkers (n = 96)) drugs showed significantly lower genomic deleteriousness scores (P < 0.001) compared to others (n = 752). Furthermore, the rates of drug withdrawals and precautions correlated significantly with the deleteriousness scores of the drugs (P < 0.01); this trend was confirmed for all drugs included in the withdrawal and precaution lists by the United Nations, European Medicines Agency, DrugBank, Beers criteria, and US FDA. Our findings suggest that the person-to-person genome sequence variability is a strong independent predictor of drug withdrawals and precautions. We propose novel measures of drug safety based on personal genome sequence analysis.
Published in December 2016
READ PUBLICATION →

Mining the Proteome of Fusobacterium nucleatum subsp. nucleatum ATCC 25586 for Potential Therapeutics Discovery: An In Silico Approach.

Authors: Habib AM, Islam MS, Sohel M, Mazumder MH, Sikder MO, Shahik SM

Abstract: The plethora of genome sequence information of bacteria in recent times has ushered in many novel strategies for antibacterial drug discovery and facilitated medical science to take up the challenge of the increasing resistance of pathogenic bacteria to current antibiotics. In this study, we adopted subtractive genomics approach to analyze the whole genome sequence of the Fusobacterium nucleatum, a human oral pathogen having association with colorectal cancer. Our study divulged 1,499 proteins of F. nucleatum, which have no homolog's in human genome. These proteins were subjected to screening further by using the Database of Essential Genes (DEG) that resulted in the identification of 32 vitally important proteins for the bacterium. Subsequent analysis of the identified pivotal proteins, using the Kyoto Encyclopedia of Genes and Genomes (KEGG) Automated Annotation Server (KAAS) resulted in sorting 3 key enzymes of F. nucleatum that may be good candidates as potential drug targets, since they are unique for the bacterium and absent in humans. In addition, we have demonstrated the three dimensional structure of these three proteins. Finally, determination of ligand binding sites of the 2 key proteins as well as screening for functional inhibitors that best fitted with the ligands sites were conducted to discover effective novel therapeutic compounds against F. nucleatum.
Published in 2016
READ PUBLICATION →

Scaffold analysis of PubChem database as background for hierarchical scaffold-based visualization.

Authors: Velkoborsky J, Hoksza D

Abstract: BACKGROUND: Visualization of large molecular datasets is a challenging yet important topic utilised in diverse fields of chemistry ranging from material engineering to drug design. Especially in drug design, modern methods of high-throughput screening generate large amounts of molecular data that call for methods enabling their analysis. One such method is classification of compounds based on their molecular scaffolds, a concept widely used by medicinal chemists to group molecules of similar properties. This classification can then be utilized for intuitive visualization of compounds. RESULTS: In this paper, we propose a scaffold hierarchy as a result of large-scale analysis of the PubChem Compound database. The analysis not only provided insights into scaffold diversity of the PubChem Compound database, but also enables scaffold-based hierarchical visualization of user compound data sets on the background of empirical chemical space, as defined by the PubChem data, or on the background of any other user-defined data set. The visualization is performed by a web based client-server application called Scaffvis. It provides an interactive zoomable tree map visualization of data sets up to hundreds of thousands molecules. Scaffvis is free to use and its source codes have been published under an open source license.Graphical abstract.
Published in 2016
READ PUBLICATION →

The harmonizome: a collection of processed datasets gathered to serve and mine knowledge about genes and proteins.

Authors: Rouillard AD, Gundersen GW, Fernandez NF, Wang Z, Monteiro CD, McDermott MG, Ma'ayan A

Abstract: Genomics, epigenomics, transcriptomics, proteomics and metabolomics efforts rapidly generate a plethora of data on the activity and levels of biomolecules within mammalian cells. At the same time, curation projects that organize knowledge from the biomedical literature into online databases are expanding. Hence, there is a wealth of information about genes, proteins and their associations, with an urgent need for data integration to achieve better knowledge extraction and data reuse. For this purpose, we developed the Harmonizome: a collection of processed datasets gathered to serve and mine knowledge about genes and proteins from over 70 major online resources. We extracted, abstracted and organized data into approximately 72 million functional associations between genes/proteins and their attributes. Such attributes could be physical relationships with other biomolecules, expression in cell lines and tissues, genetic associations with knockout mouse or human phenotypes, or changes in expression after drug treatment. We stored these associations in a relational database along with rich metadata for the genes/proteins, their attributes and the original resources. The freely available Harmonizome web portal provides a graphical user interface, a web service and a mobile app for querying, browsing and downloading all of the collected data. To demonstrate the utility of the Harmonizome, we computed and visualized gene-gene and attribute-attribute similarity networks, and through unsupervised clustering, identified many unexpected relationships by combining pairs of datasets such as the association between kinase perturbations and disease signatures. We also applied supervised machine learning methods to predict novel substrates for kinases, endogenous ligands for G-protein coupled receptors, mouse phenotypes for knockout genes, and classified unannotated transmembrane proteins for likelihood of being ion channels. The Harmonizome is a comprehensive resource of knowledge about genes and proteins, and as such, it enables researchers to discover novel relationships between biological entities, as well as form novel data-driven hypotheses for experimental validation.Database URL: http://amp.pharm.mssm.edu/Harmonizome.
Published in 2016
READ PUBLICATION →

HerDing: herb recommendation system to treat diseases using genes and chemicals.

Authors: Choi W, Choi CH, Kim YR, Kim SJ, Na CS, Lee H

Abstract: In recent years, herbs have been researched for new drug candidates because they have a long empirical history of treating diseases and are relatively free from side effects. Studies to scientifically prove the medical efficacy of herbs for target diseases often spend a considerable amount of time and effort in choosing candidate herbs and in performing experiments to measure changes of marker genes when treating herbs. A computational approach to recommend herbs for treating diseases might be helpful to promote efficiency in the early stage of such studies. Although several databases related to traditional Chinese medicine have been already developed, there is no specialized Web tool yet recommending herbs to treat diseases based on disease-related genes. Therefore, we developed a novel search engine, HerDing, focused on retrieving candidate herb-related information with user search terms (a list of genes, a disease name, a chemical name or an herb name). HerDing was built by integrating public databases and by applying a text-mining method. The HerDing website is free and open to all users, and there is no login requirement. Database URL: http://combio.gist.ac.kr/herding.
Published in 2016
READ PUBLICATION →

GT2RDF: Semantic Representation of Genetic Testing Data.

Authors: Paul Rupa A, Singh S, Zhu Q

Abstract: Accelerated by the Human Genome Project, genetic testing has become an increasingly integral component in diagnosis, treatment, management, and prevention of numerous diseases and conditions. More than 480 laboratories perform genetic tests for more than 4,600 rare and common medical conditions. These tests can effectively help health professionals to determine or predict the genetic conditions of their patients. However, physicians have not actively incorporated such innovative genetic technology into their clinical practices according to two national wide surveys commissioned by UnitedHealth Group. To fill the gap of insufficient use of a large number of genetic tests, we generated a single Resource Description Framework (RDF) resource, called GT2RDF (Genetic Testing data to RDF) by integrating information about disease, gene, phenotype, genetic test, and drug from multiple sources including Genetic Testing Registry (GTR), Online Mendelian Inheritance in Man (OMIM), MedGen, Human Phenotype Ontology (HPO), ClinVar, National Drug File Reference Terminology (NDF-RT). Meanwhile, we manually annotated and extracted information from 200 randomly selected GeneReviews chapters, and integrated into the GT2RDF. We performed two case studies to demonstrate the usability of the GT2RDF. GT2RDF will serve as a data foundation to support the design of a genetic testing recommendation system, called iGenetics, which will ultimately facilitate the pace of precision medicine by means of actively and effectively incorporating innovative genetic technology in clinical settings. Abbreviations: GT2RDF: Genetic Testing data to RDF; SWT: Semantic web technology; OWL: Ontology Web Language; RDF: Resource Description Framework; SPARQL: SPARQL Protocol and RDF Query Language; GTR: Genetic Testing Registry; OMIM: Online Mendelian Inheritance in Man; HPO: Human Phenotype Ontology; NDF-RT: National Drug File Reference Terminology; UMLS: Unified Medical Language System.
Published in December 2016
READ PUBLICATION →

Drug repurposing for glioblastoma based on molecular subtypes.

Authors: Chen Y, Xu R

Abstract: A recent multi-platform analysis by The Cancer Genome Atlas identified four distinct molecular subtypes for glioblastoma (GBM) and demonstrated that the subtypes correlate with clinical phenotypes and treatment responses. In this study, we developed a computational drug repurposing approach to predict GBM drugs based on the molecular subtypes. Our approach leverages the genomic signature for each GBM subtype, and integrates the human cancer genomics with mouse phenotype data to identify the opportunity of reusing the FDA-approved agents to treat specific GBM subtypes. Specifically, we first constructed the phenotype profile for each GBM subtype using their genomic signatures. For each approved drug, we also constructed a phenotype profile using the drug target genes. Then we developed an algorithm to match and prioritize drugs based on their phenotypic similarities to the GBM subtypes. Our approach is highly generalizable for other disorders if provided with a list of disorder-specific genes. We first evaluated the approach in predicting drugs for the whole GBM. For a combined set of approved, potential and off-label GBM drugs, we achieved a median rank of 9.3%, which is significantly higher (p