Publications Search
Explore how scientists all over the world use DrugBank in their research.
Published on February 6, 2021
READ PUBLICATION →

Current and prospective computational approaches and challenges for developing COVID-19 vaccines.

Authors: Hwang W, Lei W, Katritsis NM, MacMahon M, Chapman K, Han N

Abstract: SARS-CoV-2, which causes COVID-19, was first identified in humans in late 2019 and is a coronavirus which is zoonotic in origin. As it spread around the world there has been an unprecedented effort in developing effective vaccines. Computational methods can be used to speed up the long and costly process of vaccine development. Antigen selection, epitope prediction, and toxicity and allergenicity prediction are areas in which computational tools have already been applied as part of reverse vaccinology for SARS-CoV-2 vaccine development. However, there is potential for computational methods to assist further. We review approaches which have been used and highlight additional bioinformatic approaches and PK modelling as in silico approaches which may be useful for SARS-CoV-2 vaccine design but remain currently unexplored. As more novel viruses with pandemic potential are expected to arise in future, these techniques are not limited to application to SARS-CoV-2 but also useful to rapidly respond to novel emerging viruses.
Published on February 5, 2021
READ PUBLICATION →

Coupled mixed model for joint genetic analysis of complex disorders with two independently collected data sets.

Authors: Wang H, Pei F, Vanyukov MM, Bahar I, Wu W, Xing EP

Abstract: BACKGROUND: In the last decade, Genome-wide Association studies (GWASs) have contributed to decoding the human genome by uncovering many genetic variations associated with various diseases. Many follow-up investigations involve joint analysis of multiple independently generated GWAS data sets. While most of the computational approaches developed for joint analysis are based on summary statistics, the joint analysis based on individual-level data with consideration of confounding factors remains to be a challenge. RESULTS: In this study, we propose a method, called Coupled Mixed Model (CMM), that enables a joint GWAS analysis on two independently collected sets of GWAS data with different phenotypes. The CMM method does not require the data sets to have the same phenotypes as it aims to infer the unknown phenotypes using a set of multivariate sparse mixed models. Moreover, CMM addresses the confounding variables due to population stratification, family structures, and cryptic relatedness, as well as those arising during data collection such as batch effects that frequently appear in joint genetic studies. We evaluate the performance of CMM using simulation experiments. In real data analysis, we illustrate the utility of CMM by an application to evaluating common genetic associations for Alzheimer's disease and substance use disorder using datasets independently collected for the two complex human disorders. Comparison of the results with those from previous experiments and analyses supports the utility of our method and provides new insights into the diseases. The software is available at https://github.com/HaohanWang/CMM .
Published on February 5, 2021
READ PUBLICATION →

Genetic ancestry plays a central role in population pharmacogenomics.

Authors: Yang HC, Chen CW, Lin YT, Chu SK

Abstract: Recent studies have pointed out the essential role of genetic ancestry in population pharmacogenetics. In this study, we analyzed the whole-genome sequencing data from The 1000 Genomes Project (Phase 3) and the pharmacogenetic information from Drug Bank, PharmGKB, PharmaADME, and Biotransformation. Here we show that ancestry-informative markers are enriched in pharmacogenetic loci, suggesting that trans-ancestry differentiation must be carefully considered in population pharmacogenetics studies. Ancestry-informative pharmacogenetic loci are located in both protein-coding and non-protein-coding regions, illustrating that a whole-genome analysis is necessary for an unbiased examination over pharmacogenetic loci. Finally, those ancestry-informative pharmacogenetic loci that target multiple drugs are often a functional variant, which reflects their importance in biological functions and pathways. In summary, we develop an efficient algorithm for an ultrahigh-dimensional principal component analysis. We create genetic catalogs of ancestry-informative markers and genes. We explore pharmacogenetic patterns and establish a high-accuracy prediction panel of genetic ancestry. Moreover, we construct a genetic ancestry pharmacogenomic database Genetic Ancestry PhD ( http://hcyang.stat.sinica.edu.tw/databases/genetic_ancestry_phd/ ).
Published on February 4, 2021
READ PUBLICATION →

Prediction of adverse drug reactions based on knowledge graph embedding.

Authors: Zhang F, Sun B, Diao X, Zhao W, Shu T

Abstract: BACKGROUND: Adverse drug reactions (ADRs) are an important concern in the medication process and can pose a substantial economic burden for patients and hospitals. Because of the limitations of clinical trials, it is difficult to identify all possible ADRs of a drug before it is marketed. We developed a new model based on data mining technology to predict potential ADRs based on available drug data. METHOD: Based on the Word2Vec model in Nature Language Processing, we propose a new knowledge graph embedding method that embeds drugs and ADRs into their respective vectors and builds a logistic regression classification model to predict whether a given drug will have ADRs. RESULT: First, a new knowledge graph embedding method was proposed, and comparison with similar studies showed that our model not only had high prediction accuracy but also was simpler in model structure. In our experiments, the AUC of the classification model reached a maximum of 0.87, and the mean AUC was 0.863. CONCLUSION: In this paper, we introduce a new method to embed knowledge graph to vectorize drugs and ADRs, then use a logistic regression classification model to predict whether there is a causal relationship between them. The experiment showed that the use of knowledge graph embedding can effectively encode drugs and ADRs. And the proposed ADRs prediction system is also very effective.
Published on February 4, 2021
READ PUBLICATION →

Generative chemistry: drug discovery with deep learning generative models.

Authors: Bian Y, Xie XQ

Abstract: The de novo design of molecular structures using deep learning generative models introduces an encouraging solution to drug discovery in the face of the continuously increased cost of new drug development. From the generation of original texts, images, and videos, to the scratching of novel molecular structures the creativity of deep learning generative models exhibits the height machine intelligence can achieve. The purpose of this paper is to review the latest advances in generative chemistry which relies on generative modeling to expedite the drug discovery process. This review starts with a brief history of artificial intelligence in drug discovery to outline this emerging paradigm. Commonly used chemical databases, molecular representations, and tools in cheminformatics and machine learning are covered as the infrastructure for generative chemistry. The detailed discussions on utilizing cutting-edge generative architectures, including recurrent neural network, variational autoencoder, adversarial autoencoder, and generative adversarial network for compound generation are focused. Challenges and future perspectives follow.
Published on February 3, 2021
READ PUBLICATION →

Repurposing potential of Ayurvedic medicinal plants derived active principles against SARS-CoV-2 associated target proteins revealed by molecular docking, molecular dynamics and MM-PBSA studies.

Authors: Kumar Verma A, Kumar V, Singh S, Goswami BC, Camps I, Sekar A, Yoon S, Lee KW

Abstract: All the plants and their secondary metabolites used in the present study were obtained from Ayurveda, with historical roots in the Indian subcontinent. The selected secondary metabolites have been experimentally validated and reported as potent antiviral agents against genetically-close human viruses. The plants have also been used as a folk medicine to treat cold, cough, asthma, bronchitis, and severe acute respiratory syndrome in India and across the globe since time immemorial. The present study aimed to assess the repurposing possibility of potent antiviral compounds with SARS-CoV-2 target proteins and also with host-specific receptor and activator protease that facilitates the viral entry into the host body. Molecular docking (MDc) was performed to study molecular affinities of antiviral compounds with aforesaid target proteins. The top-scoring conformations identified through docking analysis were further validated by 100ns molecular dynamic (MD) simulation run. The stability of the conformation was studied in detail by investigating the binding free energy using MM-PBSA method. Finally, the binding affinities of all the compounds were also compared with a reference ligand, remdesivir, against the target protein RdRp. Additionally, pharmacophore features, 3D structure alignment of potent compounds and Bayesian machine learning model were also used to support the MDc and MD simulation. Overall, the study emphasized that curcumin possesses a strong binding ability with host-specific receptors, furin and ACE2. In contrast, gingerol has shown strong interactions with spike protein, and RdRp and quercetin with main protease (M(pro)) of SARS-CoV-2. In fact, all these target proteins play an essential role in mediating viral replication, and therefore, compounds targeting aforesaid target proteins are expected to block the viral replication and transcription. Overall, gingerol, curcumin and quercetin own multitarget binding ability that can be used alone or in combination to enhance therapeutic efficacy against COVID-19. The obtained results encourage further in vitro and in vivo investigations and also support the traditional use of antiviral plants preventively.
Published on February 3, 2021
READ PUBLICATION →

A Multi-Objective Approach for Drug Repurposing in Preeclampsia.

Authors: Tejera E, Perez-Castillo Y, Chamorro A, Cabrera-Andrade A, Sanchez ME

Abstract: Preeclampsia is a hypertensive disorder that occurs during pregnancy. It is a complex disease with unknown pathogenesis and the leading cause of fetal and maternal mortality during pregnancy. Using all drugs currently under clinical trial for preeclampsia, we extracted all their possible targets from the DrugBank and ChEMBL databases and labeled them as "targets". The proteins labeled as "off-targets" were extracted in the same way but while taking all antihypertensive drugs which are inhibitors of ACE and/or angiotensin receptor antagonist as query molecules. Classification models were obtained for each of the 55 total proteins (45 targets and 10 off-targets) using the TPOT pipeline optimization tool. The average accuracy of the models in predicting the external dataset for targets and off-targets was 0.830 and 0.850, respectively. The combinations of models maximizing their virtual screening performance were explored by combining the desirability function and genetic algorithms. The virtual screening performance metrics for the best model were: the Boltzmann-Enhanced Discrimination of ROC (BEDROC)alpha=160.9 = 0.258, the Enrichment Factor (EF)1% = 31.55 and the Area Under the Accumulation Curve (AUAC) = 0.831. The most relevant targets for preeclampsia were: AR, VDR, SLC6A2, NOS3 and CHRM4, while ABCG2, ERBB2, CES1 and REN led to the most relevant off-targets. A virtual screening of the DrugBank database identified estradiol, estriol, vitamins E and D, lynestrenol, mifrepristone, simvastatin, ambroxol, and some antibiotics and antiparasitics as drugs with potential application in the treatment of preeclampsia.
Published on February 2, 2021
READ PUBLICATION →

Identification of Novel Population-Specific Cell Subsets in Chinese Ulcerative Colitis Patients Using Single-Cell RNA Sequencing.

Authors: Li G, Zhang B, Hao J, Chu X, Wiestler M, Cornberg M, Xu CJ, Liu X, Li Y

Abstract: BACKGROUND & AIMS: Genome-wide association studies (GWAS) and transcriptome analyses have been performed to better understand the pathogenesis of ulcerative colitis (UC). However, current studies mainly focus on European ancestry, highlighting a great need to identify the key genes, pathways and cell types in colonic mucosal cells of adult UC patients from other ancestries. Here we aimed to identify key genes and cell types in colonic mucosal of UC. METHODS: We performed Single-cell RNA sequencing (scRNA-seq) analysis of 12 colon biopsies of UC patients and healthy controls from Chinese Han ancestry. RESULTS: Two novel plasma subsets were identified. Five epithelial/stromal and three immune cell subsets show significant difference in abundance between inflamed and non-inflamed samples. In general, UC risk genes show consistent expression alteration in both Immune cells of inflamed and non-inflamed tissues. As one of the exceptions, IgA defection, marking the signal of immune dysfunction, is specific to the inflamed area. Moreover, Th17 derived activation was observed in both epithelial cell lineage and immune cell lineage of UC patients as compared to controls , suggesting a systemic change of immune activities driven by Th17. The UC risk genes show enrichment in progenitors, glial cells and immune cells, and drug-target genes are differentially expressed in antigen presenting cells. CONCLUSIONS: Our work identifies novel population-specific plasma cell molecular signatures of UC. The transcriptional signature of UC is shared in immune cells from both inflamed and non-inflamed tissues, whereas the transcriptional response to disease is a local effect only in inflamed epithelial/stromal cells.
Published on February 2, 2021
READ PUBLICATION →

A Machine Learning-Based Biological Drug-Target Interaction Prediction Method for a Tripartite Heterogeneous Network.

Authors: Zheng Y, Wu Z

Abstract: Drug repositioning is the identification of interactions between drugs and target proteins in pharmaceutical sciences. Traditional large-scale validation through chemical experiments is time-consuming and expensive, while drug repositioning can drastically decrease the cost and duration taken by traditional drug development. With the rapid advancement of high-throughput technologies and the explosion of various biological and medical data, computational drug repositioning methods have been used to systematically identify potential drug-target interactions. Some of them are based on a particular class of machine learning algorithms called kernel methods. In this paper, we propose a new machine learning prediction method combining multiple kernels into a tripartite heterogeneous drug-target-disease interaction spaces in order to integrate multiple sources of biological information simultaneously. This novel network algorithm extends the traditional drug-target interaction bipartite graph to the third disease layer. Meanwhile, Gaussian kernel functions on heterogeneous networks and the regularized least square method of the Kronecker product are used to predict new drug-target interactions. The values of AUPR (area under the precision-recall curve) and AUC (the area under the receiver operating characteristic curve) of the proposed algorithm are significantly improved. Especially, the AUC values are improved to 0.99, 0.99, 0.97, and 0.96 on four benchmark data sets. These experimental results substantiate that the network topology can be used for predicting drug-target interactions.
Published in January 2021
READ PUBLICATION →

Covid-19.bioreproducibility.org: A web resource for SARS-CoV-2-related structural models.

Authors: Brzezinski D, Kowiel M, Cooper DR, Cymborowski M, Grabowski M, Wlodawer A, Dauter Z, Shabalin IG, Gilski M, Rupp B, Jaskolski M, Minor W

Abstract: The COVID-19 pandemic has triggered numerous scientific activities aimed at understanding the SARS-CoV-2 virus and ultimately developing treatments. Structural biologists have already determined hundreds of experimental X-ray, cryo-EM, and NMR structures of proteins and nucleic acids related to this coronavirus, and this number is still growing. To help biomedical researchers, who may not necessarily be experts in structural biology, navigate through the flood of structural models, we have created an online resource, covid19.bioreproducibility.org, that aggregates expert-verified information about SARS-CoV-2-related macromolecular models. In this article, we describe this web resource along with the suite of tools and methodologies used for assessing the structures presented therein.