Publications Search
Explore how scientists all over the world use DrugBank in their research.
Published on August 10, 2018
READ PUBLICATION →

Machine Learning Based Toxicity Prediction: From Chemical Structural Description to Transcriptome Analysis.

Authors: Wu Y, Wang G

Abstract: Toxicity prediction is very important to public health. Among its many applications, toxicity prediction is essential to reduce the cost and labor of a drug's preclinical and clinical trials, because a lot of drug evaluations (cellular, animal, and clinical) can be spared due to the predicted toxicity. In the era of Big Data and artificial intelligence, toxicity prediction can benefit from machine learning, which has been widely used in many fields such as natural language processing, speech recognition, image recognition, computational chemistry, and bioinformatics, with excellent performance. In this article, we review machine learning methods that have been applied to toxicity prediction, including deep learning, random forests, k-nearest neighbors, and support vector machines. We also discuss the input parameter to the machine learning algorithm, especially its shift from chemical structural description only to that combined with human transcriptome data analysis, which can greatly enhance prediction accuracy.
Published on August 8, 2018
READ PUBLICATION →

Discovering Health Benefits of Phytochemicals with Integrated Analysis of the Molecular Network, Chemical Properties and Ethnopharmacological Evidence.

Authors: Yoo S, Kim K, Nam H, Lee D

Abstract: Identifying the health benefits of phytochemicals is an essential step in drug and functional food development. While many in vitro screening methods have been developed to identify the health effects of phytochemicals, there is still room for improvement because of high cost and low productivity. Therefore, researchers have alternatively proposed in silico methods, primarily based on three types of approaches; utilizing molecular, chemical or ethnopharmacological information. Although each approach has its own strength in analyzing the characteristics of phytochemicals, previous studies have not considered them all together. Here, we apply an integrated in silico analysis to identify the potential health benefits of phytochemicals based on molecular analysis and chemical properties as well as ethnopharmacological evidence. From the molecular analysis, we found an average of 415.6 health effects for 591 phytochemicals. We further investigated ethnopharmacological evidence of phytochemicals and found that on average 129.1 (31%) of the predicted health effects had ethnopharmacological evidence. Lastly, we investigated chemical properties to confirm whether they are orally bio-available, drug available or effective on certain tissues. The evaluation results indicate that the health effects can be predicted more accurately by cooperatively considering the molecular analysis, chemical properties and ethnopharmacological evidence.
Published on August 3, 2018
READ PUBLICATION →

Phenotype-oriented network analysis for discovering pharmacological effects of natural compounds.

Authors: Yoo S, Nam H, Lee D

Abstract: Although natural compounds have provided a wealth of leads and clues in drug development, the process of identifying their pharmacological effects is still a challenging task. Over the last decade, many in vitro screening methods have been developed to identify the pharmacological effects of natural compounds, but they are still costly processes with low productivity. Therefore, in silico methods, primarily based on molecular information, have been proposed. However, large-scale analysis is rarely considered, since many natural compounds do not have molecular structure and target protein information. Empirical knowledge of medicinal plants can be used as a key resource to solve the problem, but this information is not fully exploited and is used only as a preliminary tool for selecting plants for specific diseases. Here, we introduce a novel method to identify pharmacological effects of natural compounds from herbal medicine based on phenotype-oriented network analysis. In this study, medicinal plants with similar efficacy were clustered by investigating hierarchical relationships between the known efficacy of plants and 5,021 phenotypes in the phenotypic network. We then discovered significantly enriched natural compounds in each plant cluster and mapped the averaged pharmacological effects of the plant cluster to the natural compounds. This approach allows us to predict unexpected effects of natural compounds that have not been found by molecular analysis. When applied to verified medicinal compounds, our method successfully identified their pharmacological effects with high specificity and sensitivity.
Published on August 1, 2018
READ PUBLICATION →

A global network of biomedical relationships derived from text.

Authors: Percha B, Altman RB

Abstract: Motivation: The biomedical community's collective understanding of how chemicals, genes and phenotypes interact is distributed across the text of over 24 million research articles. These interactions offer insights into the mechanisms behind higher order biochemical phenomena, such as drug-drug interactions and variations in drug response across individuals. To assist their curation at scale, we must understand what relationship types are possible and map unstructured natural language descriptions onto these structured classes. We used NCBI's PubTator annotations to identify instances of chemical, gene and disease names in Medline abstracts and applied the Stanford dependency parser to find connecting dependency paths between pairs of entities in single sentences. We combined a published ensemble biclustering algorithm (EBC) with hierarchical clustering to group the dependency paths into semantically-related categories, which we annotated with labels, or 'themes' ('inhibition' and 'activation', for example). We evaluated our theme assignments against six human-curated databases: DrugBank, Reactome, SIDER, the Therapeutic Target Database, OMIM and PharmGKB. Results: Clustering revealed 10 broad themes for chemical-gene relationships, 7 for chemical-disease, 10 for gene-disease and 9 for gene-gene. In most cases, enriched themes corresponded directly to known database relationships. Our final dataset, represented as a network, contained 37 491 thematically-labeled chemical-gene edges, 2 021 192 chemical-disease edges, 136 206 gene-disease edges and 41 418 gene-gene edges, each representing a single-sentence description of an interaction from somewhere in the literature. Availability and implementation: The complete network is available on Zenodo (https://zenodo.org/record/1035500). We have also provided the full set of dependency paths connecting biomedical entities in Medline abstracts, with associated sentences, for future use by the biomedical research community. Supplementary information: Supplementary data are available at Bioinformatics online.
Published in July 2018
READ PUBLICATION →

Impact of Neurodegenerative Diseases on Drug Binding to Brain Tissues: From Animal Models to Human Samples.

Authors: Ugarte A, Corbacho D, Aymerich MS, Garcia-Osta A, Cuadrado-Tejedor M, Oyarzabal J

Abstract: Drug efficacy in the central nervous system (CNS) requires an additional step after crossing the blood-brain barrier. Therapeutic agents must reach their targets in the brain to modulate them; thus, the free drug concentration hypothesis is a key parameter for in vivo pharmacology. Here, we report the impact of neurodegeneration (Alzheimer's disease (AD) and Parkinson's disease (PD) compared with healthy controls) on the binding of 10 known drugs to postmortem brain tissues from animal models and humans. Unbound drug fractions, for some drugs, are significantly different between healthy and injured brain tissues (AD or PD). In addition, drugs binding to brain tissues from AD and PD animal models do not always recapitulate their binding to the corresponding human injured brain tissues. These results reveal potentially relevant implications for CNS drug discovery.
Published on July 31, 2018
READ PUBLICATION →

Tissue-specific Network Analysis of Genetic Variants Associated with Coronary Artery Disease.

Authors: Miao X, Chen X, Xie Z, Lin H

Abstract: Coronary artery disease (CAD) is a leading cause of death worldwide. Recent genome-wide association studies have identified more than one hundred susceptibility loci associated with CAD. However, the underlying mechanism of these genetic loci to CAD susceptibility is still largely unknown. We performed a tissue-specific network analysis of CAD using the summary statistics from one of the largest genome-wide association studies. Variant-level associations were summarized into gene-level associations, and a CAD-related interaction network was built using experimentally validated gene interactions and gene coexpression in coronary artery. The network contained 102 genes, of which 53 were significantly associated with CAD. Pathway enrichment analysis revealed that many genes in the network were involved in the regulation of peripheral arteries. In summary, we performed a tissue-specific network analysis and found abnormalities in the peripheral arteries might be an important pathway underlying the pathogenesis of CAD. Future functional characterization might further validate our findings and identify potential therapeutic targets for CAD.
Published in July 2018
READ PUBLICATION →

Semi-Mechanistic Model for Predicting the Dosing Rate in Children and Neonates for Drugs Mainly Eliminated by Cytochrome Metabolism.

Authors: Cerruti L, Bleyzac N, Tod M

Abstract: BACKGROUND AND OBJECTIVE: A simple approach is proposed to predict drug clearance in children when no paediatric data are available for drugs metabolised by cytochromes. METHODS: The maturation functions of cytochrome activity and binding proteins in plasma were combined with several measures of body size to describe drug clearance increase with age. The complete model and different reduced models were evaluated on a large panel of drug clearance data in children. The parameters of the models were estimated by nonlinear regression. Bias and precision of predictions were determined. RESULTS: Two hundred and ten clearance ratios were available for the analysis, corresponding to 53 drugs mainly eliminated by cytochrome metabolism. The age range was 1.5 day to 16 years and there were 30 values for children aged less than 2 years. Fat-free mass at power 0.75 yielded better results than the other body size descriptor tested. The model with the best fit was based on the fat-free mass ratio, the unbound fraction ratio, maturation functions for cytochromes and no maturation function for clearance by other routes. In children aged less than 2 years, the predictive performances were much better with the final model than with the model based on body surface area. The final model was almost unbiased. CONCLUSIONS: This model allows the calculation of the maintenance dose of drugs eliminated mainly by cytochromes. After external validation, it could be used in children aged less than 2 years. In older children, the model reduces to a simple approach based on body surface area or preferably on fat-free mass at power 0.75. The model is not suitable for preterm neonates.
Published in July 2018
READ PUBLICATION →

Tissue-Specific Analysis of Pharmacological Pathways.

Authors: Hao Y, Quinnies K, Realubit R, Karan C, Tatonetti NP

Abstract: Understanding the downstream consequences of pharmacologically targeted proteins is essential to drug design. Current approaches investigate molecular effects under tissue-naive assumptions. Many target proteins, however, have tissue-specific expression. A systematic study connecting drugs to target pathways in in vivo human tissues is needed. We introduced a data-driven method that integrates drug-target relationships with gene expression, protein-protein interaction, and pathway annotation data. We applied our method to four independent genomewide expression datasets and built 467,396 connections between 1,034 drugs and 954 pathways in 259 human tissues or cell lines. We validated our results using data from L1000 and Pharmacogenomics Knowledgebase (PharmGKB), and observed high precision and recall. We predicted and tested anticoagulant effects of 22 compounds experimentally that were previously unknown, and used clinical data to validate these effects retrospectively. Our systematic study provides a better understanding of the cellular response to drugs and can be applied to many research topics in systems pharmacology.
Published in July 2018
READ PUBLICATION →

Patient similarity by joint matrix trifactorization to identify subgroups in acute myeloid leukemia.

Authors: Vitali F, Marini S, Pala D, Demartini A, Montoli S, Zambelli A, Bellazzi R

Abstract: Objective: Computing patients' similarity is of great interest in precision oncology since it supports clustering and subgroup identification, eventually leading to tailored therapies. The availability of large amounts of biomedical data, characterized by large feature sets and sparse content, motivates the development of new methods to compute patient similarities able to fuse heterogeneous data sources with the available knowledge. Materials and Methods: In this work, we developed a data integration approach based on matrix trifactorization to compute patient similarities by integrating several sources of data and knowledge. We assess the accuracy of the proposed method: (1) on several synthetic data sets which similarity structures are affected by increasing levels of noise and data sparsity, and (2) on a real data set coming from an acute myeloid leukemia (AML) study. The results obtained are finally compared with the ones of traditional similarity calculation methods. Results: In the analysis of the synthetic data set, where the ground truth is known, we measured the capability of reconstructing the correct clusters, while in the AML study we evaluated the Kaplan-Meier curves obtained with the different clusters and measured their statistical difference by means of the log-rank test. In presence of noise and sparse data, our data integration method outperform other techniques, both in the synthetic and in the AML data. Discussion: In case of multiple heterogeneous data sources, a matrix trifactorization technique can successfully fuse all the information in a joint model. We demonstrated how this approach can be efficiently applied to discover meaningful patient similarities and therefore may be considered a reliable data driven strategy for the definition of new research hypothesis for precision oncology. Conclusion: The better performance of the proposed approach presents an advantage over previous methods to provide accurate patient similarities supporting precision medicine.
Published in July 2018
READ PUBLICATION →

The Cellosaurus, a Cell-Line Knowledge Resource.

Authors: Bairoch A

Abstract: The Cellosaurus is a knowledge resource on cell lines. It aims to describe all cell lines used in biomedical research. Its scope encompasses both vertebrates and invertebrates. Currently, information for >100,000 cell lines is provided. For each cell line, it provides a wealth of information, cross-references, and literature citations. The Cellosaurus is available on the ExPASy server (https://web.expasy.org/cellosaurus/) and can be downloaded in a variety of formats. Among its many uses, the Cellosaurus is a key resource to help researchers identify potentially contaminated/misidentified cell lines, thus contributing to improving the quality of research in the life sciences.