The Protein Data Bank archive as an open data resource.

Authors: Berman HM, Kleywegt GJ, Nakamura H, Markley JL

Abstract: The Protein Data Bank archive was established in 1971, and recently celebrated its 40th anniversary (Berman et al. in Structure 20:391, 2012). An analysis of interrelationships of the science, technology and community leads to further insights into how this resource evolved into one of the oldest and most widely used open-access data resources in biology.
Automatic construction of a large-scale and accurate drug-side-effect association knowledge base from biomedical literature.

Authors: Xu R, Wang Q

Abstract: Systems approaches to studying drug-side-effect (drug-SE) associations are emerging as an active research area for drug target discovery, drug repositioning, and drug toxicity prediction. However, currently available drug-SE association databases are far from being complete. Herein, in an effort to increase the data completeness of current drug-SE relationship resources, we present an automatic learning approach to accurately extract drug-SE pairs from the vast amount of published biomedical literature, a rich knowledge source of side effect information for commercial, experimental, and even failed drugs. For the text corpus, we used 119,085,682 MEDLINE sentences and their parse trees. We used known drug-SE associations derived from US Food and Drug Administration (FDA) drug labels as prior knowledge to find relevant sentences and parse trees. We extracted syntactic patterns associated with drug-SE pairs from the resulting set of parse trees. We developed pattern-ranking algorithms to prioritize drug-SE-specific patterns. We then selected a set of patterns with both high precisions and recalls in order to extract drug-SE pairs from the entire MEDLINE. In total, we extracted 38,871 drug-SE pairs from MEDLINE using the learned patterns, the majority of which have not been captured in FDA drug labels to date. On average, our knowledge-driven pattern-learning approach in extracting drug-SE pairs from MEDLINE has achieved a precision of 0.833, a recall of 0.407, and an F1 of 0.545. We compared our approach to a support vector machine (SVM)-based machine learning and a co-occurrence statistics-based approach. We show that the pattern-learning approach is largely complementary to the SVM- and co-occurrence-based approaches with significantly higher precision and F1 but lower recall. We demonstrated by correlation analysis that the extracted drug side effects correlate positively with both drug targets, metabolism, and indications.
Machine learning-based prediction of drug-drug interactions by integrating drug phenotypic, therapeutic, chemical, and genomic properties.

Authors: Cheng F, Zhao Z

Abstract: OBJECTIVE: Drug-drug interactions (DDIs) are an important consideration in both drug development and clinical application, especially for co-administered medications. While it is necessary to identify all possible DDIs during clinical trials, DDIs are frequently reported after the drugs are approved for clinical use, and they are a common cause of adverse drug reactions (ADR) and increasing healthcare costs. Computational prediction may assist in identifying potential DDIs during clinical trials. METHODS: Here we propose a heterogeneous network-assisted inference (HNAI) framework to assist with the prediction of DDIs. First, we constructed a comprehensive DDI network that contained 6946 unique DDI pairs connecting 721 approved drugs based on DrugBank data. Next, we calculated drug-drug pair similarities using four features: phenotypic similarity based on a comprehensive drug-ADR network, therapeutic similarity based on the drug Anatomical Therapeutic Chemical classification system, chemical structural similarity from SMILES data, and genomic similarity based on a large drug-target interaction network built using the DrugBank and Therapeutic Target Database. Finally, we applied five predictive models in the HNAI framework: naive Bayes, decision tree, k-nearest neighbor, logistic regression, and support vector machine, respectively. RESULTS: The area under the receiver operating characteristic curve of the HNAI models is 0.67 as evaluated using fivefold cross-validation. Using antipsychotic drugs as an example, several HNAI-predicted DDIs that involve weight gain and cytochrome P450 inhibition were supported by literature resources. CONCLUSIONS: Through machine learning-based integration of drug phenotypic, therapeutic, structural, and genomic similarities, we demonstrated that HNAI is promising for uncovering DDIs in drug development and postmarketing surveillance.
Towards a foundational representation of potential drug-drug interaction knowledge.

Authors: Brochhausen M, Schneider J, Malone D, Empey PE, Hogan WR, Boyce RD

Abstract: Inadequate representation of evidence and knowledge about potential drug-drug interactions is a major factor underlying disagreements among sources of drug information that are used by clinicians. In this paper we describe the initial steps toward developing a foundational domain representation that allows tracing the evidence underlying potential drug-drug interaction knowledge. The new representation includes biological and biomedical entities represented in existing ontologies and terminologies to foster integration of data from relevant fields such as physiology, anatomy, and laboratory sciences.
Missing heritability of common diseases and treatments outside the protein-coding exome.

Authors: Sadee W, Hartmann K, Seweryn M, Pietrzak M, Handelman SK, Rempala GA

Abstract: Genetic factors strongly influence risk of common human diseases and treatment outcomes but the causative variants remain largely unknown; this gap has been called the 'missing heritability'. We propose several hypotheses that in combination have the potential to narrow the gap. First, given a multi-stage path from wellness to disease, we propose that common variants under positive evolutionary selection represent normal variation and gate the transition between wellness and an 'off-well' state, revealing adaptations to changing environmental conditions. In contrast, genome-wide association studies (GWAS) focus on deleterious variants conveying disease risk, accelerating the path from off-well to illness and finally specific diseases, while common 'normal' variants remain hidden in the noise. Second, epistasis (dynamic gene-gene interactions) likely assumes a central role in adaptations and evolution; yet, GWAS analyses currently are poorly designed to reveal epistasis. As gene regulation is germane to adaptation, we propose that epistasis among common normal regulatory variants, or between common variants and less frequent deleterious variants, can have strong protective or deleterious phenotypic effects. These gene-gene interactions can be highly sensitive to environmental stimuli and could account for large differences in drug response between individuals. Residing largely outside the protein-coding exome, common regulatory variants affect either transcription of coding and non-coding RNAs (regulatory SNPs, or rSNPs) or RNA functions and processing (structural RNA SNPs, or srSNPs). Third, with the vast majority of causative variants yet to be discovered, GWAS rely on surrogate markers, a confounding factor aggravated by the presence of more than one causative variant per gene and by epistasis. We propose that the confluence of these factors may be responsible to large extent for the observed heritability gap.
Drug repositioning by integrating target information through a heterogeneous network model.

Authors: Wang W, Yang S, Zhang X, Li J

Abstract: MOTIVATION: The emergence of network medicine not only offers more opportunities for better and more complete understanding of the molecular complexities of diseases, but also serves as a promising tool for identifying new drug targets and establishing new relationships among diseases that enable drug repositioning. Computational approaches for drug repositioning by integrating information from multiple sources and multiple levels have the potential to provide great insights to the complex relationships among drugs, targets, disease genes and diseases at a system level. RESULTS: In this article, we have proposed a computational framework based on a heterogeneous network model and applied the approach on drug repositioning by using existing omics data about diseases, drugs and drug targets. The novelty of the framework lies in the fact that the strength between a disease-drug pair is calculated through an iterative algorithm on the heterogeneous graph that also incorporates drug-target information. Comprehensive experimental results show that the proposed approach significantly outperforms several recent approaches. Case studies further illustrate its practical usefulness. AVAILABILITY AND IMPLEMENTATION: CONTACT: SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
A curated C. difficile strain 630 metabolic network: prediction of essential targets and inhibitors.

Authors: Larocque M, Chenard T, Najmanovich R

Abstract: BACKGROUND: Clostridium difficile is the leading cause of hospital-borne infections occurring when the natural intestinal flora is depleted following antibiotic treatment. Current treatments for Clostridium difficile infections present high relapse rates and new hyper-virulent and multi-resistant strains are emerging, making the study of this nosocomial pathogen necessary to find novel therapeutic targets. RESULTS: We present iMLTC806cdf, an extensively curated reconstructed metabolic network for the C. difficile pathogenic strain 630. iMLTC806cdf contains 806 genes, 703 metabolites and 769 metabolic, 117 exchange and 145 transport reactions. iMLTC806cdf is the most complete and accurate metabolic reconstruction of a gram-positive anaerobic bacteria to date. We validate the model with simulated growth assays in different media and carbon sources and use it to predict essential genes. We obtain 89.2% accuracy in the prediction of gene essentiality when compared to experimental data for B. subtilis homologs (the closest organism for which such data exists). We predict the existence of 76 essential genes and 39 essential gene pairs, a number of which are unique to C. difficile and have non-existing or predicted non-essential human homologs. For 29 of these potential therapeutic targets, we find 125 inhibitors of homologous proteins including approved drugs with the potential for drug repositioning, that when validated experimentally could serve as starting points in the development of new antibiotics. CONCLUSIONS: We created a highly curated metabolic network model of C. difficile strain 630 and used it to predict essential genes as potential new therapeutic targets in the fight against Clostridium difficile infections.
ASCL1 is a lineage oncogene providing therapeutic targets for high-grade neuroendocrine lung cancers.

Authors: Augustyn A, Borromeo M, Wang T, Fujimoto J, Shao C, Dospoy PD, Lee V, Tan C, Sullivan JP, Larsen JE, Girard L, Behrens C, Wistuba II, Xie Y, Cobb MH, Gazdar AF, Johnson JE, Minna JD

Abstract: Aggressive neuroendocrine lung cancers, including small cell lung cancer (SCLC) and non-small cell lung cancer (NSCLC), represent an understudied tumor subset that accounts for approximately 40,000 new lung cancer cases per year in the United States. No targeted therapy exists for these tumors. We determined that achaete-scute homolog 1 (ASCL1), a transcription factor required for proper development of pulmonary neuroendocrine cells, is essential for the survival of a majority of lung cancers (both SCLC and NSCLC) with neuroendocrine features. By combining whole-genome microarray expression analysis performed on lung cancer cell lines with ChIP-Seq data designed to identify conserved transcriptional targets of ASCL1, we discovered an ASCL1 target 72-gene expression signature that (i) identifies neuroendocrine differentiation in NSCLC cell lines, (ii) is predictive of poor prognosis in resected NSCLC specimens from three datasets, and (iii) represents novel "druggable" targets. Among these druggable targets is B-cell CLL/lymphoma 2, which when pharmacologically inhibited stops ASCL1-dependent tumor growth in vitro and in vivo and represents a proof-of-principle ASCL1 downstream target gene. Analysis of downstream targets of ASCL1 represents an important advance in the development of targeted therapy for the neuroendocrine class of lung cancers, providing a significant step forward in the understanding and therapeutic targeting of the molecular vulnerabilities of neuroendocrine lung cancer.
Measurement of small molecule binding kinetics on a protein microarray by plasmonic-based electrochemical impedance imaging.

Authors: Liang W, Wang S, Festa F, Wiktor P, Wang W, Magee M, LaBaer J, Tao N

Abstract: We report on a quantitative study of small molecule binding kinetics on protein microarrays with plasmonic-based electrochemical impedance microscopy (P-EIM). P-EIM measures electrical impedance optically with high spatial resolution by converting a surface charge change to a surface plasmon resonance (SPR) image intensity change, and the signal is not scaled to the mass of the analyte. Using P-EIM, we measured binding kinetics and affinity between small molecule drugs (imatinib and SB202190) and their target proteins (kinases Abl1 and p38-alpha). The measured affinity values are consistent with reported values measured by an indirect competitive binding assay. We also found that SB202190 has weak bindings to ABL1 with KD > 10 muM, which is not reported in the literature. Furthermore, we found that P-EIM is less prone to nonspecific binding, a long-standing issue in SPR. Our results show that P-EIM is a novel method for high-throughput measurement of small molecule binding kinetics and affinity, which is critical to the understanding of small molecules in biological systems and discovery of small molecule drugs.
Structure-based prediction of drug distribution across the headgroup and core strata of a phospholipid bilayer using surrogate phases.

Authors: Natesan S, Lukacova V, Peng M, Subramaniam R, Lynch S, Wang Z, Tandlich R, Balaz S

Abstract: Solvation of drugs in the core (C) and headgroup (H) strata of phospholipid bilayers affects their physiological transport rates and accumulation. These characteristics, especially a complete drug distribution profile across the bilayer strata, are tedious to obtain experimentally, to the point that even simplified preferred locations are only available for a few dozen compounds. Recently, we showed that the partition coefficient (P) values in the system of hydrated diacetyl phosphatidylcholine (DAcPC) and n-hexadecane (C16), as surrogates of the H- and C-strata of the bilayer composed of the most abundant mammalian phospholipid, PC, agree well with the preferred bilayer location of compounds. High P values are typical for lipophiles accumulating in the core, and low P values are characteristic of cephalophiles preferring the headgroups. This simple pattern does not hold for most compounds, which usually have more even distribution and may also accumulate at the H/C interface. To model complete distribution, the correlates of solvation energies are needed for each drug state in the bilayer: (1) for the H-stratum it is the DAcPC/W P value, calculated as the ratio of the C16/W and C16/DAcPC (W for water) P values; (2) for the C-stratum, the C16/W P value; (3) for the H/C interface, the P values for all plausible molecular poses are characterized using the fragment DAcPC/W and C16/W solvation parameters for the parts of the molecule embedded in the H- and C-strata, respectively. The correlates, each scaled by two Collander coefficients, were used in a nonlinear, mass-balance based model of intrabilayer distribution, which was applied to the easily measurable overall P values of compounds in the DMPC (M = myristoyl) bilayers and monolayers as the dependent variables. The calibrated model for 107 neutral compounds explains 94% of experimental variance, achieves similar cross-validation levels, and agrees well with the nontrivial, experimentally determined bilayer locations for 27 compounds. The resulting structure-based prediction system for intrabilayer distribution will facilitate more realistic modeling of passive transport and drug interactions with those integral membrane proteins, which have the binding sites located in the bilayer, such as some enzymes, influx and efflux transporters, and receptors. If only overall bilayer accumulation is of interest, the 1-octanol/W P values suffice to model the studied set.