Published on July 1, 2013

A multi-site feasibility study for personalized medicine in canines with osteosarcoma.

Authors: Monks NR, Cherba DM, Kamerling SG, Simpson H, Rusk AW, Carter D, Eugster E, Mooney M, Sigler R, Steensma M, Grabinski T, Marotti KR, Webb CP

Abstract: BACKGROUND: A successful therapeutic strategy, specifically tailored to the molecular constitution of an individual and their disease, is an ambitious objective of modern medicine. In this report, we highlight a feasibility study in canine osteosarcoma focused on refining the infrastructure and processes required for prospective clinical trials using a series of gene expression-based Personalized Medicine (PMed) algorithms to predict suitable therapies within 5 days of sample receipt. METHODS: Tumor tissue samples were collected immediately following limb amputation and shipped overnight from veterinary practices. Upon receipt (day 1), RNA was extracted from snap-frozen tissue, with an adjacent H&E section for pathological diagnosis. Samples passing RNA and pathology QC were shipped to a CLIA-certified laboratory for genomic profiling. After mapping of canine probe sets to human genes and normalization against a (normal) reference set, gene level Z-scores were submitted to the PMed algorithms. The resulting PMed report was immediately forwarded to the veterinarians. Upon receipt and review of the PMed report, feedback from the practicing veterinarians was captured. RESULTS: 20 subjects were enrolled over a 5 month period. Tissue from 13 subjects passed both histological and RNA QC and were submitted for genomic analysis and subsequent PMed analysis and report generation. 11 of the 13 samples for which PMed reports were produced were communicated to the veterinarian within the target 5 business days. Of the 7 samples that failed QC, 4 were due to poor RNA quality, whereas 2 were failed following pathological review. Comments from the practicing veterinarians were generally positive and constructive, highlighting a number of areas for improvement, including enhanced education regarding PMed report interpretation, drug availability, affordable pricing and suitable canine dosing. CONCLUSIONS: This feasibility trial demonstrated that with the appropriate infrastructure and processes it is possible to perform an in-depth molecular analysis of a patient's tumor in support of real time therapeutic decision making within 5 days of sample receipt. A number of areas for improvement have been identified that should reduce the level of sample attrition and support clinical decision making.
Published in June 2013

Proteome-wide prediction of self-interacting proteins based on multiple properties.

Authors: Liu Z, Guo F, Zhang J, Wang J, Lu L, Li D, He F

Abstract: Self-interacting proteins, whose two or more copies can interact with each other, play important roles in cellular functions and the evolution of protein interaction networks (PINs). Knowing whether a protein can self-interact can contribute to and sometimes is crucial for the elucidation of its functions. Previous related research has mainly focused on the structures and functions of specific self-interacting proteins, whereas knowledge on their overall properties is limited. Meanwhile, the two current most common high throughput protein interaction assays have limited ability to detect self-interactions because of biological artifacts and design limitations, whereas the bioinformatic prediction method of self-interacting proteins is lacking. This study aims to systematically study and predict self-interacting proteins from an overall perspective. We find that compared with other proteins the self-interacting proteins in the structural aspect contain more domains; in the evolutionary aspect they tend to be conserved and ancient; in the functional aspect they are significantly enriched with enzyme genes, housekeeping genes, and drug targets, and in the topological aspect tend to occupy important positions in PINs. Furthermore, based on these features, after feature selection, we use logistic regression to integrate six representative features, including Gene Ontology term, domain, paralogous interactor, enzyme, model organism self-interacting protein, and betweenness centrality in the PIN, to develop a proteome-wide prediction model of self-interacting proteins. Using 5-fold cross-validation and an independent test, this model shows good performance. Finally, the prediction model is developed into a user-friendly web service SLIPPER (SeLf-Interacting Protein PrEdictoR). Users may submit a list of proteins, and then SLIPPER will return the probability_scores measuring their possibility to be self-interacting proteins and various related annotation information. This work helps us understand the role self-interacting proteins play in cellular functions from an overall perspective, and the constructed prediction model may contribute to the high throughput finding of self-interacting proteins and provide clues for elucidating their functions.
Published in June 2013

How Far Could We Go with Open Data - A Case Study for TRPV1 Antagonists.

Authors: Tsareva DA, Ecker GF

Abstract: Publicly open databases of small compounds have become an indispensable tool for chemoinformaticians for collection and preparation of datasets suitable for drug discovery questions. Since these databases comprise compounds coming from structure-activity relationship (SAR) studies performed by different research groups, they are very diverse with respect to the biological assays used. In the present study we analyzed the applicability of a thoroughly curated dataset gathered from open sources for ligand-based studies, using the transient receptor potential vanilloid type 1 (TRPV1) as use case. Thorough curation of compounds according to the biological assay type and conditions led to a dataset of comparable bioactive chemicals. Subsequent exhaustive analysis of the obtained dataset using classification algorithms demonstrated that the models obtained in most of the cases possess reliable quality. Analysis of constantly misclassified compounds showed that they belong to local SAR series, where small changes in structure lead to different class labels. These small structural differences could not be captured by the classification algorithms. However application of the 3D alignment-independent QSAR technique GRIND for local, structurally related series overcomes this problem.
Published in June 2013

eFindSite: improved prediction of ligand binding sites in protein models using meta-threading, machine learning and auxiliary ligands.

Authors: Brylinski M, Feinstein WP

Abstract: Molecular structures and functions of the majority of proteins across different species are yet to be identified. Much needed functional annotation of these gene products often benefits from the knowledge of protein-ligand interactions. Towards this goal, we developed eFindSite, an improved version of FINDSITE, designed to more efficiently identify ligand binding sites and residues using only weakly homologous templates. It employs a collection of effective algorithms, including highly sensitive meta-threading approaches, improved clustering techniques, advanced machine learning methods and reliable confidence estimation systems. Depending on the quality of target protein structures, eFindSite outperforms geometric pocket detection algorithms by 15-40 % in binding site detection and by 5-35 % in binding residue prediction. Moreover, compared to FINDSITE, it identifies 14 % more binding residues in the most difficult cases. When multiple putative binding pockets are identified, the ranking accuracy is 75-78 %, which can be further improved by 3-4 % by including auxiliary information on binding ligands extracted from biomedical literature. As a first across-genome application, we describe structure modeling and binding site prediction for the entire proteome of Escherichia coli. Carefully calibrated confidence estimates strongly indicate that highly reliable ligand binding predictions are made for the majority of gene products, thus eFindSite holds a significant promise for large-scale genome annotation and drug development projects. eFindSite is freely available to the academic community at .
Published in June 2013

Structure and dynamics of molecular networks: a novel paradigm of drug discovery: a comprehensive review.

Authors: Csermely P, Korcsmaros T, Kiss HJ, London G, Nussinov R

Abstract: Despite considerable progress in genome- and proteome-based high-throughput screening methods and in rational drug design, the increase in approved drugs in the past decade did not match the increase of drug development costs. Network description and analysis not only give a systems-level understanding of drug action and disease complexity, but can also help to improve the efficiency of drug design. We give a comprehensive assessment of the analytical tools of network topology and dynamics. The state-of-the-art use of chemical similarity, protein structure, protein-protein interaction, signaling, genetic interaction and metabolic networks in the discovery of drug targets is summarized. We propose that network targeting follows two basic strategies. The "central hit strategy" selectively targets central nodes/edges of the flexible networks of infectious agents or cancer cells to kill them. The "network influence strategy" works against other diseases, where an efficient reconfiguration of rigid networks needs to be achieved by targeting the neighbors of central nodes/edges. It is shown how network techniques can help in the identification of single-target, edgetic, multi-target and allo-network drug target candidates. We review the recent boom in network methods helping hit identification, lead selection optimizing drug efficacy, as well as minimizing side-effects and drug toxicity. Successful network-based drug development strategies are shown through the examples of infections, cancer, metabolic diseases, neurodegenerative diseases and aging. Summarizing >1200 references we suggest an optimized protocol of network-aided drug development, and provide a list of systems-level hallmarks of drug quality. Finally, we highlight network-related drug development trends helping to achieve these hallmarks by a cohesive, global approach.
Published on June 26, 2013

Integrative relational machine-learning for understanding drug side-effect profiles.

Authors: Bresso E, Grisoni R, Marchetti G, Karaboga AS, Souchet M, Devignes MD, Smail-Tabbone M

Abstract: BACKGROUND: Drug side effects represent a common reason for stopping drug development during clinical trials. Improving our ability to understand drug side effects is necessary to reduce attrition rates during drug development as well as the risk of discovering novel side effects in available drugs. Today, most investigations deal with isolated side effects and overlook possible redundancy and their frequent co-occurrence. RESULTS: In this work, drug annotations are collected from SIDER and DrugBank databases. Terms describing individual side effects reported in SIDER are clustered with a semantic similarity measure into term clusters (TCs). Maximal frequent itemsets are extracted from the resulting drug x TC binary table, leading to the identification of what we call side-effect profiles (SEPs). A SEP is defined as the longest combination of TCs which are shared by a significant number of drugs. Frequent SEPs are explored on the basis of integrated drug and target descriptors using two machine learning methods: decision-trees and inductive-logic programming. Although both methods yield explicit models, inductive-logic programming method performs relational learning and is able to exploit not only drug properties but also background knowledge. Learning efficiency is evaluated by cross-validation and direct testing with new molecules. Comparison of the two machine-learning methods shows that the inductive-logic-programming method displays a greater sensitivity than decision trees and successfully exploit background knowledge such as functional annotations and pathways of drug targets, thereby producing rich and expressive rules. All models and theories are available on a dedicated web site. CONCLUSIONS: Side effect profiles covering significant number of drugs have been extracted from a drug xside-effect association table. Integration of background knowledge concerning both chemical and biological spaces has been combined with a relational learning method for discovering rules which explicitly characterize drug-SEP associations. These rules are successfully used for predicting SEPs associated with new drugs.
Published on June 25, 2013

Predicting the protein targets for athletic performance-enhancing substances.

Authors: Mavridis L, Mitchell JB

Abstract: BACKGROUND: The World Anti-Doping Agency (WADA) publishes the Prohibited List, a manually compiled international standard of substances and methods prohibited in-competition, out-of-competition and in particular sports. It would be ideal to be able to identify all substances that have one or more performance-enhancing pharmacological actions in an automated, fast and cost effective way. Here, we use experimental data derived from the ChEMBL database (~7,000,000 activity records for 1,300,000 compounds) to build a database model that takes into account both structure and experimental information, and use this database to predict both on-target and off-target interactions between these molecules and targets relevant to doping in sport. RESULTS: The ChEMBL database was screened and eight well populated categories of activities (Ki, Kd, EC50, ED50, activity, potency, inhibition and IC50) were used for a rule-based filtering process to define the labels "active" or "inactive". The "active" compounds for each of the ChEMBL families were thereby defined and these populated our bioactivity-based filtered families. A structure-based clustering step was subsequently performed in order to split families with more than one distinct chemical scaffold. This produced refined families, whose members share both a common chemical scaffold and bioactivity against a common target in ChEMBL. CONCLUSIONS: We have used the Parzen-Rosenblatt machine learning approach to test whether compounds in ChEMBL can be correctly predicted to belong to their appropriate refined families. Validation tests using the refined families gave a significant increase in predictivity compared with the filtered or with the original families. Out of 61,660 queries in our Monte Carlo cross-validation, belonging to 19,639 refined families, 41,300 (66.98%) had the parent family as the top prediction and 53,797 (87.25%) had the parent family in the top four hits. Having thus validated our approach, we used it to identify the protein targets associated with the WADA prohibited classes. For compounds where we do not have experimental data, we use their computed patterns of interaction with protein targets to make predictions of bioactivity. We hope that other groups will test these predictions experimentally in the future.
Published on June 22, 2013

Drug repositioning: a machine-learning approach through data integration.

Authors: Napolitano F, Zhao Y, Moreira VM, Tagliaferri R, Kere J, D'Amato M, Greco D

Abstract: : Existing computational methods for drug repositioning either rely only on the gene expression response of cell lines after treatment, or on drug-to-disease relationships, merging several information levels. However, the noisy nature of the gene expression and the scarcity of genomic data for many diseases are important limitations to such approaches. Here we focused on a drug-centered approach by predicting the therapeutic class of FDA-approved compounds, not considering data concerning the diseases. We propose a novel computational approach to predict drug repositioning based on state-of-the-art machine-learning algorithms. We have integrated multiple layers of information: i) on the distances of the drugs based on how similar are their chemical structures, ii) on how close are their targets within the protein-protein interaction network, and iii) on how correlated are the gene expression patterns after treatment. Our classifier reaches high accuracy levels (78%), allowing us to re-interpret the top misclassifications as re-classifications, after rigorous statistical evaluation. Efficient drug repurposing has the potential to significantly impact the whole field of drug development. The results presented here can significantly accelerate the translation into the clinics of known compounds for novel therapeutic uses.
Published in May 2013

Chemical genetics and its potential in cardiac stem cell therapy.

Authors: Vieira JM, Riley PR

Abstract: Over the last decade or so, intensive research in cardiac stem cell biology has led to significant discoveries towards a potential therapy for cardiovascular disease; the main cause of morbidity and mortality in humans. The major goal within the field of cardiovascular regenerative medicine is to replace lost or damaged cardiac muscle and coronaries following ischaemic disease. At present, de novo cardiomyocytes can be generated either in vitro, for cell transplantation or disease modelling using directed differentiation of embryonic stem cells or induced pluripotent stem cells, or in vivo via direct reprogramming of resident adult cardiac fibroblast or ectopic stimulation of resident cardiac stem or progenitor cells. A major bottleneck with all of these approaches is the low efficiency of cardiomyocyte differentiation alongside their relative functional immaturity. Chemical genetics, and the application of phenotypic screening with small molecule libraries, represent a means to enhance understanding of the molecular pathways controlling cardiovascular cell differentiation and, moreover, offer the potential for discovery of new drugs to invoke heart repair and regeneration. Here, we review the potential of chemical genetics in cardiac stem cell therapy, highlighting not only the major contributions to the field so far, but also the future challenges.
Published in May 2013

A community-driven global reconstruction of human metabolism.

Authors: Thiele I, Swainston N, Fleming RM, Hoppe A, Sahoo S, Aurich MK, Haraldsdottir H, Mo ML, Rolfsson O, Stobbe MD, Thorleifsson SG, Agren R, Bolling C, Bordel S, Chavali AK, Dobson P, Dunn WB, Endler L, Hala D, Hucka M, Hull D, Jameson D, Jamshidi N, Jonsson JJ, Juty N, Keating S, Nookaew I, Le Novere N, Malys N, Mazein A, Papin JA, Price ND, Selkov E Sr, Sigurdsson MI, Simeonidis E, Sonnenschein N, Smallbone K, Sorokin A, van Beek JH, Weichart D, Goryanin I, Nielsen J, Westerhoff HV, Kell DB, Mendes P, Palsson BO

Abstract: Multiple models of human metabolism have been reconstructed, but each represents only a subset of our knowledge. Here we describe Recon 2, a community-driven, consensus 'metabolic reconstruction', which is the most comprehensive representation of human metabolism that is applicable to computational modeling. Compared with its predecessors, the reconstruction has improved topological and functional features, including approximately 2x more reactions and approximately 1.7x more unique metabolites. Using Recon 2 we predicted changes in metabolite biomarkers for 49 inborn errors of metabolism with 77% accuracy when compared to experimental data. Mapping metabolomic data and drug information onto Recon 2 demonstrates its potential for integrating and analyzing diverse data types. Using protein expression data, we automatically generated a compendium of 65 cell type-specific models, providing a basis for manual curation or investigation of cell-specific metabolic properties. Recon 2 will facilitate many future biomedical studies and is freely available at