Publications Search
Explore how scientists all over the world use DrugBank in their research.
Published on March 22, 2018
READ PUBLICATION →

Heterogeneous network propagation for herb target identification.

Authors: Yang K, Liu G, Wang N, Zhang R, Yu J, Chen J, Zhou X

Abstract: BACKGROUND: Identifying targets of herbs is a primary step for investigating pharmacological mechanisms of herbal drugs in Traditional Chinese medicine (TCM). Experimental targets identification of herbs is a difficult and time-consuming work. Computational method for identifying herb targets is an efficient approach. However, how to make full use of heterogeneous network data about herbs and targets to improve the performance of herb targets prediction is still a dilemma. METHODS: In our study, a random walk algorithm on the heterogeneous herb-target network (named heNetRW) has been proposed to identify protein targets of herbs. By building a heterogeneous herb-target network involving herbs, targets and their interactions and simulating random walk algorithm on the network, the candidate targets of the given herb can be predicted. RESULTS: The experimental results on large-scale dataset showed that heNetRW had higher performance of targets prediction than PRINCE (improved F1-score by 0.08 and Hit@1 by 21.34% in one validation setting, and improved F1-score by 0.54 and Hit@1 by 69.08% in the other validation setting). Furthermore, we evaluated novel candidate targets of two herbs (rhizoma coptidis and turmeric), which showed our approach could generate potential targets that are valuable for further experimental investigations. CONCLUSIONS: Compared with PRINCE algorithm, heNetRW algorithm can fuse more known information (such as, known herb-target associations and pathway-based similarities of protein pairs) to improve prediction performance. Experimental results also indicated heNetRW had higher performance than PRINCE. The prediction results not only can be used to guide the selection of candidate targets of herbs, but also help to reveal the molecule mechanisms of herbal drugs.
Published on March 21, 2018
READ PUBLICATION →

A novel adaptive ensemble classification framework for ADME prediction.

Authors: Yang M, Chen J, Xu L, Shi X, Zhou X, Xi Z, An R, Wang X

Abstract: It has now become clear that in silico prediction of ADME (absorption, distribution, metabolism, and elimination) characteristics is an important component of the drug discovery process. Therefore, there has been considerable interest in the development of in silico modeling of ADME prediction in recent years. Despite the advances in this field, there remains challenges when facing the unbalanced and high dimensionality problems simultaneously. In this work, we introduce a novel adaptive ensemble classification framework named as AECF to deal with the above issues. AECF includes four components which are (1) data balancing, (2) generating individual models, (3) combining individual models, and (4) optimizing the ensemble. We considered five sampling methods, seven base modeling techniques, and ten ensemble rules to build a choice pool. The proper route of constructing predictive models was determined automatically according to the imbalance ratio (IR). With the adaptive characteristics of AECF, it can be used to work on the different kinds of ADME data, and the balanced data is a special case in AECF. We evaluated the performance of our approach using five extensive ADME datasets concerning Caco-2 cell permeability (CacoP), human intestinal absorption (HIA), oral bioavailability (OB), and P-glycoprotein (P-gp) binders (substrates/inhibitors, PS/PI). The performance of AECF was evaluated on two independent datasets, and the average AUC values were 0.8574-0.8602, 0.8968-0.9182, 0.7821-0.7981, 0.8139-0.8311, and 0.8874-0.8898 for CacoP, HIA, OB, PS and PI, respectively. Our results show that AECF can provide better performance and generality compared with individual models and two representative ensemble methods bagging and boosting. Furthermore, the degree of complementarity among the AECF ensemble members was investigated for the purpose of elucidating the potential advantages of our framework. We found that AECF can effectively select complementary members to construct predictive models by our auto-adaptive optimization approach, and the additional diversity in both sample and feature space mainly contribute to the complementarity of ensemble members.
Published on March 19, 2018
READ PUBLICATION →

In silico drug combination discovery for personalized cancer therapy.

Authors: Jeon M, Kim S, Park S, Lee H, Kang J

Abstract: BACKGROUND: Drug combination therapy, which is considered as an alternative to single drug therapy, can potentially reduce resistance and toxicity, and have synergistic efficacy. As drug combination therapies are widely used in the clinic for hypertension, asthma, and AIDS, they have also been proposed for the treatment of cancer. However, it is difficult to select and experimentally evaluate effective combinations because not only is the number of cancer drug combinations extremely large but also the effectiveness of drug combinations varies depending on the genetic variation of cancer patients. A computational approach that prioritizes the best drug combinations considering the genetic information of a cancer patient is necessary to reduce the search space. RESULTS: We propose an in-silico method for personalized drug combination therapy discovery. We predict the synergy between two drugs and a cell line using genomic information, targets of drugs, and pharmacological information. We calculate and predict the synergy scores of 583 drug combinations for 31 cancer cell lines. For feature dimension reduction, we select the mutations or expression levels of the genes in cancer-related pathways. We also used various machine learning models. Extremely Randomized Trees (ERT), a tree-based ensemble model, achieved the best performance in the synergy score prediction regression task. The correlation coefficient between the synergy scores predicted by ERT and the actual observations is 0.738. To compare with an existing drug combination synergy classification model, we reformulate the problem as a binary classification problem by thresholding the synergy scores. ERT achieved an F1 score of 0.954 when synergy scores of 20 and -20 were used as the threshold, which is 8.7% higher than that obtained by the state-of-the-art baseline model. Moreover, the model correctly predicts the most synergistic combination, from approximately 100 candidate drug combinations, as the top choice for 15 out of the 31 cell lines. For 28 out of the 31 cell lines, the model predicts the most synergistic combination in the top 10 of approximately 100 candidate drug combinations. Finally, we analyze the results, generate synergistic rules using the features, and validate the rules through the literature survey. CONCLUSION: Using various types of genomic information of cancer cell lines, targets of drugs, and pharmacological information, a drug combination synergy prediction pipeline is proposed. The pipeline regresses the synergy level between two drugs and a cell line as well as classifies if there exists synergy or antagonism between them. Discovering new drug combinations by our pipeline may improve personalized cancer therapy.
Published on March 15, 2018
READ PUBLICATION →

Rare variants in drug target genes contributing to complex diseases, phenome-wide.

Authors: Verma SS, Josyula N, Verma A, Zhang X, Veturi Y, Dewey FE, Hartzel DN, Lavage DR, Leader J, Ritchie MD, Pendergrass SA

Abstract: The DrugBank database consists of ~800 genes that are well characterized drug targets. This list of genes is a useful resource for association testing. For example, loss of function (LOF) genetic variation has the potential to mimic the effect of drugs, and high impact variation in these genes can impact downstream traits. Identifying novel associations between genetic variation in these genes and a range of diseases can also uncover new uses for the drugs that target these genes. Phenome Wide Association Studies (PheWAS) have been successful in identifying genetic associations across hundreds of thousands of diseases. We have conducted a novel gene based PheWAS to test the effect of rare variants in DrugBank genes, evaluating associations between these genes and more than 500 quantitative and dichotomous phenotypes. We used whole exome sequencing data from 38,568 samples in Geisinger MyCode Community Health Initiative. We evaluated the results of this study when binning rare variants using various filters based on potential functional impact. We identified multiple novel associations, and the majority of the significant associations were driven by functionally annotated variation. Overall, this study provides a sweeping exploration of rare variant associations within functionally relevant genes across a wide range of diagnoses.
Published on March 12, 2018
READ PUBLICATION →

IMPPAT: A curated database of Indian Medicinal Plants, Phytochemistry And Therapeutics.

Authors: Mohanraj K, Karthikeyan BS, Vivek-Ananth RP, Chand RPB, Aparna SR, Mangalapandi P, Samal A

Abstract: Phytochemicals of medicinal plants encompass a diverse chemical space for drug discovery. India is rich with a flora of indigenous medicinal plants that have been used for centuries in traditional Indian medicine to treat human maladies. A comprehensive online database on the phytochemistry of Indian medicinal plants will enable computational approaches towards natural product based drug discovery. In this direction, we present, IMPPAT, a manually curated database of 1742 Indian Medicinal Plants, 9596 Phytochemicals, And 1124 Therapeutic uses spanning 27074 plant-phytochemical associations and 11514 plant-therapeutic associations. Notably, the curation effort led to a non-redundant in silico library of 9596 phytochemicals with standard chemical identifiers and structure information. Using cheminformatic approaches, we have computed the physicochemical, ADMET (absorption, distribution, metabolism, excretion, toxicity) and drug-likeliness properties of the IMPPAT phytochemicals. We show that the stereochemical complexity and shape complexity of IMPPAT phytochemicals differ from libraries of commercial compounds or diversity-oriented synthesis compounds while being similar to other libraries of natural products. Within IMPPAT, we have filtered a subset of 960 potential druggable phytochemicals, of which majority have no significant similarity to existing FDA approved drugs, and thus, rendering them as good candidates for prospective drugs. IMPPAT database is openly accessible at: https://cb.imsc.res.in/imppat .
Published on March 9, 2018
READ PUBLICATION →

Comparative assessment of strategies to identify similar ligand-binding pockets in proteins.

Authors: Govindaraj RG, Brylinski M

Abstract: BACKGROUND: Detecting similar ligand-binding sites in globally unrelated proteins has a wide range of applications in modern drug discovery, including drug repurposing, the prediction of side effects, and drug-target interactions. Although a number of techniques to compare binding pockets have been developed, this problem still poses significant challenges. RESULTS: We evaluate the performance of three algorithms to calculate similarities between ligand-binding sites, APoc, SiteEngine, and G-LoSA. Our assessment considers not only the capabilities to identify similar pockets and to construct accurate local alignments, but also the dependence of these alignments on the sequence order. We point out certain drawbacks of previously compiled datasets, such as the inclusion of structurally similar proteins, leading to an overestimated performance. To address these issues, a rigorous procedure to prepare unbiased, high-quality benchmarking sets is proposed. Further, we conduct a comparative assessment of techniques directly aligning binding pockets to indirect strategies employing structure-based virtual screening with AutoDock Vina and rDock. CONCLUSIONS: Thorough benchmarks reveal that G-LoSA offers a fairly robust overall performance, whereas the accuracy of APoc and SiteEngine is satisfactory only against easy datasets. Moreover, combining various algorithms into a meta-predictor improves the performance of existing methods to detect similar binding sites in unrelated proteins by 5-10%. All data reported in this paper are freely available at https://osf.io/6ngbs/ .
Published on March 8, 2018
READ PUBLICATION →

Comparing Mechanistic and Preclinical Predictions of Volume of Distribution on a Large Set of Drugs.

Authors: Chan R, De Bruyn T, Wright M, Broccatelli F

Abstract: PURPOSE: Volume of distribution at steady state (Vdss) is a fundamental pharmacokinetic (PK) parameter driven predominantly by passive processes and physicochemical properties of the compound. Human Vdss can be estimated using in silico mechanistic methods or empirically scaled from Vdss values obtained from preclinical species. In this study the accuracy and the complementarity of these two approaches are analyzed leveraging a large data set (over 150 marketed drugs). METHODS: For all the drugs analyzed in this study experimental in vitro measurements of LogP, plasma protein binding and pKa are used as input for the mechanistic in silico model to predict human Vdss. The software used for predicting human tissue partition coefficients and Vdss based on the method described by Rodgers and Rowland is made available as supporting information. RESULTS: This assessment indicates that overall the in silico mechanistic model presented by Rodgers and Rowland is comparably accurate or superior to empirical approaches based on the extrapolation of in vivo data from preclinical species. CONCLUSIONS: These results illustrate the great potential of mechanistic in silico models to accurately predict Vdss in humans. This in silico method does not rely on in vivo data and is, consequently, significantly time and resource sparing. The success of this in silico model further suggests that reasonable predictability of Vdss in preclinical species could be obtained by a similar process.
Published on March 8, 2018
READ PUBLICATION →

Data Sets Representative of the Structures and Experimental Properties of FDA-Approved Drugs.

Authors: Douguet D

Abstract: Presented here are several data sets that gather information collected from the labels of the FDA approved drugs: their molecular structures and those of the described active metabolites, their associated pharmacokinetics and pharmacodynamics data, and the history of their marketing authorization by the FDA. To date, 1852 chemical structures have been identified with a molecular weight less than 2000 of which 492 are or have active metabolites. To promote the sharing of data, the original web server was upgraded for browsing the database and downloading the data sets (http://chemoinfo.ipmc.cnrs.fr/edrug3d). It is believed that the multidimensional chemistry-oriented collections are an essential resource for a thorough analysis of the current drug chemical space. The data sets are envisioned as being used in a wide range of endeavors that include drug repurposing, drug design, privileged structures analyses, structure-activity relationship studies, and improving of absorption, distribution, metabolism, and elimination predictive models.
Published on March 2, 2018
READ PUBLICATION →

Integrative analysis of omics summary data reveals putative mechanisms underlying complex traits.

Authors: Wu Y, Zeng J, Zhang F, Zhu Z, Qi T, Zheng Z, Lloyd-Jones LR, Marioni RE, Martin NG, Montgomery GW, Deary IJ, Wray NR, Visscher PM, McRae AF, Yang J

Abstract: The identification of genes and regulatory elements underlying the associations discovered by GWAS is essential to understanding the aetiology of complex traits (including diseases). Here, we demonstrate an analytical paradigm of prioritizing genes and regulatory elements at GWAS loci for follow-up functional studies. We perform an integrative analysis that uses summary-level SNP data from multi-omics studies to detect DNA methylation (DNAm) sites associated with gene expression and phenotype through shared genetic effects (i.e., pleiotropy). We identify pleiotropic associations between 7858 DNAm sites and 2733 genes. These DNAm sites are enriched in enhancers and promoters, and >40% of them are mapped to distal genes. Further pleiotropic association analyses, which link both the methylome and transcriptome to 12 complex traits, identify 149 DNAm sites and 66 genes, indicating a plausible mechanism whereby the effect of a genetic variant on phenotype is mediated by genetic regulation of transcription through DNAm.
Published on March 1, 2018
READ PUBLICATION →

Drug-drug interaction extraction via hierarchical RNNs on sequence and shortest dependency paths.

Authors: Zhang Y, Zheng W, Lin H, Wang J, Yang Z, Dumontier M

Abstract: Motivation: Adverse events resulting from drug-drug interactions (DDI) pose a serious health issue. The ability to automatically extract DDIs described in the biomedical literature could further efforts for ongoing pharmacovigilance. Most of neural networks-based methods typically focus on sentence sequence to identify these DDIs, however the shortest dependency path (SDP) between the two entities contains valuable syntactic and semantic information. Effectively exploiting such information may improve DDI extraction. Results: In this article, we present a hierarchical recurrent neural networks (RNNs)-based method to integrate the SDP and sentence sequence for DDI extraction task. Firstly, the sentence sequence is divided into three subsequences. Then, the bottom RNNs model is employed to learn the feature representation of the subsequences and SDP, and the top RNNs model is employed to learn the feature representation of both sentence sequence and SDP. Furthermore, we introduce the embedding attention mechanism to identify and enhance keywords for the DDI extraction task. We evaluate our approach using the DDI extraction 2013 corpus. Our method is competitive or superior in performance as compared with other state-of-the-art methods. Experimental results show that the sentence sequence and SDP are complementary to each other. Integrating the sentence sequence with SDP can effectively improve the DDI extraction performance. Availability and implementation: The experimental data is available at https://github.com/zhangyijia1979/hierarchical-RNNs-model-for-DDI-extraction. Contact: zhyj@dlut.edu.cn or michel.dumontier@maastrichtuniversity.nl. Supplementary information: Supplementary data are available at Bioinformatics online.