Publications Search
Explore how scientists all over the world use DrugBank in their research.
Published on January 8, 2019

ETCM: an encyclopaedia of traditional Chinese medicine.

Authors: Xu HY, Zhang YQ, Liu ZM, Chen T, Lv CY, Tang SH, Zhang XB, Zhang W, Li ZY, Zhou RR, Yang HJ, Wang XJ, Huang LQ

Abstract: Traditional Chinese medicine (TCM) is not only an effective solution for primary health care, but also a great resource for drug innovation and discovery. To meet the increasing needs for TCM-related data resources, we developed ETCM, an Encyclopedia of Traditional Chinese Medicine. ETCM includes comprehensive and standardized information for the commonly used herbs and formulas of TCM, as well as their ingredients. The herb basic property and quality control standard, formula composition, ingredient drug-likeness, as well as many other information provided by ETCM can serve as a convenient resource for users to obtain thorough information about a herb or a formula. To facilitate functional and mechanistic studies of TCM, ETCM provides predicted target genes of TCM ingredients, herbs, and formulas, according to the chemical fingerprint similarity between TCM ingredients and known drugs. A systematic analysis function is also developed in ETCM, which allows users to explore the relationships or build networks among TCM herbs, formulas,ingredients, gene targets, and related pathways or diseases. ETCM is freely accessible at We expect ETCM to develop into a major data warehouse for TCM and to promote TCM related researches and drug development in the future.
Published on January 7, 2019

Wireless Electrochemical Detection on a Microfluidic Compact Disc (CD) and Evaluation of Redox-Amplification during Flow.

Authors: Bauer M, Bartoli J, Martinez-Chapa SO, Madou M

Abstract: Novel biomarkers and lower limits of detection enable improved diagnostics. In this paper we analyze the influence of flow on the lower limit of electrochemical detection on a microfluidic Compact Disc (CD). Implementing wireless transfer of data reduces noise during measurements and allows for real time sensing, demonstrated with the ferri-ferroyanide redox-couple in single and dual mode cyclic voltammetry. The impact of flow on redox-amplification and electrode integration for the lowest limit of detection are discussed.
Published on January 7, 2019

BO-LSTM: classifying relations via long short-term memory networks along biomedical ontologies.

Authors: Lamurias A, Sousa D, Clarke LA, Couto FM

Abstract: BACKGROUND: Recent studies have proposed deep learning techniques, namely recurrent neural networks, to improve biomedical text mining tasks. However, these techniques rarely take advantage of existing domain-specific resources, such as ontologies. In Life and Health Sciences there is a vast and valuable set of such resources publicly available, which are continuously being updated. Biomedical ontologies are nowadays a mainstream approach to formalize existing knowledge about entities, such as genes, chemicals, phenotypes, and disorders. These resources contain supplementary information that may not be yet encoded in training data, particularly in domains with limited labeled data. RESULTS: We propose a new model to detect and classify relations in text, BO-LSTM, that takes advantage of domain-specific ontologies, by representing each entity as the sequence of its ancestors in the ontology. We implemented BO-LSTM as a recurrent neural network with long short-term memory units and using open biomedical ontologies, specifically Chemical Entities of Biological Interest (ChEBI), Human Phenotype, and Gene Ontology. We assessed the performance of BO-LSTM with drug-drug interactions mentioned in a publicly available corpus from an international challenge, composed of 792 drug descriptions and 233 scientific abstracts. By using the domain-specific ontology in addition to word embeddings and WordNet, BO-LSTM improved the F1-score of both the detection and classification of drug-drug interactions, particularly in a document set with a limited number of annotations. We adapted an existing DDI extraction model with our ontology-based method, obtaining a higher F1 score than the original model. Furthermore, we developed and made available a corpus of 228 abstracts annotated with relations between genes and phenotypes, and demonstrated how BO-LSTM can be applied to other types of relations. CONCLUSIONS: Our findings demonstrate that besides the high performance of current deep learning techniques, domain-specific ontologies can still be useful to mitigate the lack of labeled data.
Published on January 5, 2019

BioTransformer: a comprehensive computational tool for small molecule metabolism prediction and metabolite identification.

Authors: Djoumbou-Feunang Y, Fiamoncini J, Gil-de-la-Fuente A, Greiner R, Manach C, Wishart DS

Abstract: BACKGROUND: A number of computational tools for metabolism prediction have been developed over the last 20 years to predict the structures of small molecules undergoing biological transformation or environmental degradation. These tools were largely developed to facilitate absorption, distribution, metabolism, excretion, and toxicity (ADMET) studies, although there is now a growing interest in using such tools to facilitate metabolomics and exposomics studies. However, their use and widespread adoption is still hampered by several factors, including their limited scope, breath of coverage, availability, and performance. RESULTS: To address these limitations, we have developed BioTransformer, a freely available software package for accurate, rapid, and comprehensive in silico metabolism prediction and compound identification. BioTransformer combines a machine learning approach with a knowledge-based approach to predict small molecule metabolism in human tissues (e.g. liver tissue), the human gut as well as the environment (soil and water microbiota), via its metabolism prediction tool. A comprehensive evaluation of BioTransformer showed that it was able to outperform two state-of-the-art commercially available tools (Meteor Nexus and ADMET Predictor), with precision and recall values up to 7 times better than those obtained for Meteor Nexus or ADMET Predictor on the same sets of pharmaceuticals, pesticides, phytochemicals or endobiotics under similar or identical constraints. Furthermore BioTransformer was able to reproduce 100% of the transformations and metabolites predicted by the EAWAG pathway prediction system. Using mass spectrometry data obtained from a rat experimental study with epicatechin supplementation, BioTransformer was also able to correctly identify 39 previously reported epicatechin metabolites via its metabolism identification tool, and suggest 28 potential metabolites, 17 of which matched nine monoisotopic masses for which no evidence of a previous report could be found. CONCLUSION: BioTransformer can be used as an open access command-line tool, or a software library. It is freely available at . Moreover, it is also freely available as an open access RESTful application at , which allows users to manually or programmatically submit queries, and retrieve metabolism predictions or compound identification data.
Published on January 5, 2019

The computational prediction of drug-disease interactions using the dual-network L2,1-CMF method.

Authors: Cui Z, Gao YL, Liu JX, Wang J, Shang J, Dai LY

Abstract: BACKGROUND: Predicting drug-disease interactions (DDIs) is time-consuming and expensive. Improving the accuracy of prediction results is necessary, and it is crucial to develop a novel computing technology to predict new DDIs. The existing methods mostly use the construction of heterogeneous networks to predict new DDIs. However, the number of known interacting drug-disease pairs is small, so there will be many errors in this heterogeneous network that will interfere with the final results. RESULTS: A novel method, known as the dual-network L2,1-collaborative matrix factorization, is proposed to predict novel DDIs. The Gaussian interaction profile kernels and L2,1-norm are introduced in our method to achieve better results than other advanced methods. The network similarities of drugs and diseases with their chemical and semantic similarities are combined in this method. CONCLUSIONS: Cross validation is used to evaluate our method, and simulation experiments are used to predict new interactions using two different datasets. Finally, our prediction accuracy is better than other existing methods. This proves that our method is feasible and effective.
Published on January 4, 2019

Identifying Protein Features Responsible for Improved Drug Repurposing Accuracies Using the CANDO Platform: Implications for Drug Design.

Authors: Mangione W, Samudrala R

Abstract: Drug repurposing is a valuable tool for combating the slowing rates of novel therapeutic discovery. The Computational Analysis of Novel Drug Opportunities (CANDO) platform performs shotgun repurposing of 2030 indications/diseases using 3733 drugs/compounds to predict interactions with 46,784 proteins and relating them via proteomic interaction signatures. The accuracy is calculated by comparing interaction similarities of drugs approved for the same indications. We performed a unique subset analysis by breaking down the full protein library into smaller subsets and then recombining the best performing subsets into larger supersets. Up to 14% improvement in accuracy is seen upon benchmarking the supersets, representing a 100(-)1000-fold reduction in the number of proteins considered relative to the full library. Further analysis revealed that libraries comprised of proteins with more equitably diverse ligand interactions are important for describing compound behavior. Using one of these libraries to generate putative drug candidates against malaria, tuberculosis, and large cell carcinoma results in more drugs that could be validated in the biomedical literature compared to using those suggested by the full protein library. Our work elucidates the role of particular protein subsets and corresponding ligand interactions that play a role in drug repurposing, with implications for drug design and machine learning approaches to improve the CANDO platform.
Published on January 4, 2019

Data-Independent Acquisition Mass Spectrometry To Quantify Protein Levels in FFPE Tumor Biopsies for Molecular Diagnostics.

Authors: Kim YJ, Sweet SMM, Egertson JD, Sedgewick AJ, Woo S, Liao WL, Merrihew GE, Searle BC, Vaske C, Heaton R, MacCoss MJ, Hembrough T

Abstract: Mass spectrometry-based protein quantitation is currently used to measure therapeutically relevant protein biomarkers in CAP/CLIA setting to predict likely responses of known therapies. Selected reaction monitoring (SRM) is the method of choice due to its outstanding analytical performance. However, data-independent acquisition (DIA) is now emerging as a proteome-scale clinical assay. We evaluated the ability of DIA to profile the patient-specific proteomes of sample-limited tumor biopsies and to quantify proteins of interest in a targeted fashion using formalin-fixed, paraffin-embedded (FFPE) tumor biopsies ( n = 12) selected from our clinical laboratory. DIA analysis on the tumor biopsies provided 3713 quantifiable proteins including actionable biomarkers currently in clinical use, successfully separated two gastric cancers from colorectal cancer specimen solely on the basis of global proteomic profiles, and identified subtype-specific proteins with prognostic or diagnostic value. We demonstrate the potential use of DIA-based quantitation to inform therapeutic decision-making using TUBB3, for which clinical cutoff expression levels have been established by SRM. Comparative analysis of DIA-based proteomic profiles and mRNA expression levels found positively and negatively correlated protein-gene pairs, a finding consistent with previously reported results from fresh-frozen tumor tissues.
Published on January 3, 2019

A retrosynthetic analysis algorithm implementation.

Authors: Watson IA, Wang J, Nicolaou CA

Abstract: The need for synthetic route design arises frequently in discovery-oriented chemistry organizations. While traditionally finding solutions to this problem has been the domain of human experts, several computational approaches, aided by the algorithmic advances and the availability of large reaction collections, have recently been reported. Herein we present our own implementation of a retrosynthetic analysis method and demonstrate its capabilities in an attempt to identify synthetic routes for a collection of approved drugs. Our results indicate that the method, leveraging on reaction transformation rules learned from a large patent reaction dataset, can identify multiple theoretically feasible synthetic routes and, thus, support research chemist everyday efforts.
Published on January 1, 2019

RareLSD: a manually curated database of lysosomal enzymes associated with rare diseases.

Authors: Akhter S, Kaur H, Agrawal P, Raghava GPS

Abstract: RareLSD is a manually curated database of lysosomal enzymes associated with rare diseases that maintains comprehensive information of 63 unique lysosomal enzymes and 93 associated disorders. Each entry provides a complete information on the disorder that includes the name of the disease, organ affected, age of onset, available drug, inheritance pattern, defected enzyme and single nucleotide polymorphism. To facilitate users in designing drugs against these diseases, we predicted and maintained structures of lysosomal enzymes. Our information portal also contains information on biochemical assays against disease-associated enzymes obtained from PubChem. Each lysosomal entry is supported by information that includes disorders, inheritance pattern, drugs, family members, active inhibitors, etc. Eventually, a user-friendly web interface has been developed to facilitate the users in searching and browsing data in RareLSD with a wide range of options. RareLSD is integrated with sequence similarity search tools (e.g. BLAST and Smith-Waterman algorithm) for analysis. It is built on responsive templates that are compatible with most of browsers and screens including smartphones and gadgets (mobile, iPhone, iPad, tablets, etc.).
Published on January 1, 2019

Enhanced taxonomy annotation of antiviral activity data from ChEMBL.

Authors: Nikitina AA, Orlov AA, Kozlovskaya LI, Palyulin VA, Osolodkin DI

Abstract: The discovery of antiviral drugs is a rapidly developing area of medicinal chemistry research. The emergence of resistant variants and outbreaks of poorly studied viral diseases make this area constantly developing. The amount of antiviral activity data available in ChEMBL consistently grows, but virus taxonomy annotation of these data is not sufficient for thorough studies of antiviral chemical space. We developed a procedure for semi-automatic extraction of antiviral activity data from ChEMBL and mapped them to the virus taxonomy developed by the International Committee for Taxonomy of Viruses (ICTV). The procedure is based on the lists of virus-related values of ChEMBL annotation fields and a dictionary of virus names and acronyms mapped to ICTV taxa. Application of this data extraction procedure allows retrieving from ChEMBL 1.6 times more assays linked to 2.5 times more compounds and data points than ChEMBL web interface allows. Mapping of these data to ICTV taxa allows analyzing all the compounds tested against each viral species. Activity values and structures of the compounds were standardized, and the antiviral activity profile was created for each standard structure. Data set compiled using this algorithm was called ViralChEMBL. As case studies, we compared descriptor and scaffold distributions for the full ChEMBL and its `viral' and `non-viral' subsets, identified the most studied compounds and created a self-organizing map for ViralChEMBL. Our approach to data annotation appeared to be a very efficient tool for the study of antiviral chemical space.