Published in 2015

Identification of potential drug targets by subtractive genome analysis of Escherichia coli O157:H7: an in silico approach.

Authors: Mondal SI, Ferdous S, Jewel NA, Akter A, Mahmud Z, Islam MM, Afrin T, Karim N

Abstract: Bacterial enteric infections resulting in diarrhea, dysentery, or enteric fever constitute a huge public health problem, with more than a billion episodes of disease annually in developing and developed countries. In this study, the deadly agent of hemorrhagic diarrhea and hemolytic uremic syndrome, Escherichia coli O157:H7 was investigated with extensive computational approaches aimed at identifying novel and broad-spectrum antibiotic targets. A systematic in silico workflow consisting of comparative genomics, metabolic pathways analysis, and additional drug prioritizing parameters was used to identify novel drug targets that were essential for the pathogen's survival but absent in its human host. Comparative genomic analysis of Kyoto Encyclopedia of Genes and Genomes annotated metabolic pathways identified 350 putative target proteins in E. coli O157:H7 which showed no similarity to human proteins. Further bio-informatic approaches including prediction of subcellular localization, calculation of molecular weight, and web-based investigation of 3D structural characteristics greatly aided in filtering the potential drug targets from 350 to 120. Ultimately, 44 non-homologous essential proteins of E. coli O157:H7 were prioritized and proved to have the eligibility to become novel broad-spectrum antibiotic targets and DNA polymerase III alpha (dnaE) was the top-ranked among these targets. Moreover, druggability of each of the identified drug targets was evaluated by the DrugBank database. In addition, 3D structure of the dnaE was modeled and explored further for in silico docking with ligands having potential druggability. Finally, we confirmed that the compounds N-coeleneterazine and N-(1,4-dihydro-5H-tetrazol-5-ylidene)-9-oxo-9H-xanthene-2-sulfon-amide were the most suitable ligands of dnaE and hence proposed as the potential inhibitors of this target protein. The results of this study could facilitate the discovery and release of new and effective drugs against E. coli O157:H7 and other deadly human bacterial pathogens.
Published in 2015

A Network-Based Target Overlap Score for Characterizing Drug Combinations: High Correlation with Cancer Clinical Trial Results.

Authors: Ligeti B, Penzvalto Z, Vera R, Gyorffy B, Pongor S

Abstract: Drug combinations are highly efficient in systemic treatment of complex multigene diseases such as cancer, diabetes, arthritis and hypertension. Most currently used combinations were found in empirical ways, which limits the speed of discovery for new and more effective combinations. Therefore, there is a substantial need for efficient and fast computational methods. Here, we present a principle that is based on the assumption that perturbations generated by multiple pharmaceutical agents propagate through an interaction network and can cause unexpected amplification at targets not immediately affected by the original drugs. In order to capture this phenomenon, we introduce a novel Target Overlap Score (TOS) that is defined for two pharmaceutical agents as the number of jointly perturbed targets divided by the number of all targets potentially affected by the two agents. We show that this measure is correlated with the known effects of beneficial and deleterious drug combinations taken from the DCDB, TTD and databases. We demonstrate the utility of TOS by correlating the score to the outcome of recent clinical trials evaluating trastuzumab, an effective anticancer agent utilized in combination with anthracycline- and taxane- based systemic chemotherapy in HER2-receptor (erb-b2 receptor tyrosine kinase 2) positive breast cancer.
Published in 2015

Novel therapeutics for coronary artery disease from genome-wide association study data.

Authors: Grover MP, Ballouz S, Mohanasundaram KA, George RA, Goscinski A, Crowley TM, Sherman CD, Wouters MA

Abstract: BACKGROUND: Coronary artery disease (CAD), one of the leading causes of death globally, is influenced by both environmental and genetic risk factors. Gene-centric genome-wide association studies (GWAS) involving cases and controls have been remarkably successful in identifying genetic loci contributing to CAD. Modern in silico platforms, such as candidate gene prediction tools, permit a systematic analysis of GWAS data to identify candidate genes for complex diseases like CAD. Subsequent integration of drug-target data from drug databases with the predicted candidate genes can potentially identify novel therapeutics suitable for repositioning towards treatment of CAD. METHODS: Previously, we were able to predict 264 candidate genes and 104 potential therapeutic targets for CAD using Gentrepid (, a candidate gene prediction platform with two bioinformatic modules to reanalyze Wellcome Trust Case-Control Consortium GWAS data. In an expanded study, using five bioinformatic modules on the same data, Gentrepid predicted 647 candidate genes and successfully replicated 55% of the candidate genes identified by the more powerful CARDIoGRAMplusC4D consortium meta-analysis. Hence, Gentrepid was capable of enhancing lower quality genotype-phenotype data, using an independent knowledgebase of existing biological data. Here, we used our methodology to integrate drug data from three drug databases: the Therapeutic Target Database, PharmGKB and Drug Bank, with the 647 candidate gene predictions from Gentrepid. We utilized known CAD targets, the scientific literature, existing drug data and the CARDIoGRAMplusC4D meta-analysis study as benchmarks to validate Gentrepid predictions for CAD. RESULTS: Our analysis identified a total of 184 predicted candidate genes as novel therapeutic targets for CAD, and 981 novel therapeutics feasible for repositioning in clinical trials towards treatment of CAD. The benchmarks based on known CAD targets and the scientific literature showed that our results were significant (p < 0.05). CONCLUSIONS: We have demonstrated that available drugs may potentially be repositioned as novel therapeutics for the treatment of CAD. Drug repositioning can save valuable time and money spent on preclinical and phase I clinical studies.
Published in 2015

In silico search of energy metabolism inhibitors for alternative leishmaniasis treatments.

Authors: Silva LA, Vinaud MC, Castro AM, Cravo PV, Bezerra JC

Abstract: Leishmaniasis is a complex disease that affects mammals and is caused by approximately 20 distinct protozoa from the genus Leishmania. Leishmaniasis is an endemic disease that exerts a large socioeconomic impact on poor and developing countries. The current treatment for leishmaniasis is complex, expensive, and poorly efficacious. Thus, there is an urgent need to develop more selective, less expensive new drugs. The energy metabolism pathways of Leishmania include several interesting targets for specific inhibitors. In the present study, we sought to establish which energy metabolism enzymes in Leishmania could be targets for inhibitors that have already been approved for the treatment of other diseases. We were able to identify 94 genes and 93 Leishmania energy metabolism targets. Using each gene's designation as a search criterion in the TriTrypDB database, we located the predicted peptide sequences, which in turn were used to interrogate the DrugBank, Therapeutic Target Database (TTD), and PubChem databases. We identified 44 putative targets of which 11 are predicted to be amenable to inhibition by drugs which have already been approved for use in humans for 11 of these targets. We propose that these drugs should be experimentally tested and potentially used in the treatment of leishmaniasis.
Published in 2015

Feature engineering for drug name recognition in biomedical texts: feature conjunction and feature selection.

Authors: Liu S, Tang B, Chen Q, Wang X, Fan X

Abstract: Drug name recognition (DNR) is a critical step for drug information extraction. Machine learning-based methods have been widely used for DNR with various types of features such as part-of-speech, word shape, and dictionary feature. Features used in current machine learning-based methods are usually singleton features which may be due to explosive features and a large number of noisy features when singleton features are combined into conjunction features. However, singleton features that can only capture one linguistic characteristic of a word are not sufficient to describe the information for DNR when multiple characteristics should be considered. In this study, we explore feature conjunction and feature selection for DNR, which have never been reported. We intuitively select 8 types of singleton features and combine them into conjunction features in two ways. Then, Chi-square, mutual information, and information gain are used to mine effective features. Experimental results show that feature conjunction and feature selection can improve the performance of the DNR system with a moderate number of features and our DNR system significantly outperforms the best system in the DDIExtraction 2013 challenge.
Published in 2015

Colorectal cancer drug target prediction using ontology-based inference and network analysis.

Authors: Tao C, Sun J, Zheng WJ, Chen J, Xu H

Abstract: Identification of novel drug targets is a critical step in drug development. Many recent studies have produced multiple types of data, which provides an opportunity to mine the relationships among them to predict drug targets. In this study, we present a novel integrative approach that combines ontology reasoning with network-assisted gene ranking to predict new drug targets. We utilized colorectal cancer (CRC) as a proof-of-concept use case to illustrate the approach. Starting from FDA-approved CRC drugs and the relationships among disease, drug, gene, pathway, and SNP in an ontology representing PharmGKB data, we inferred 113 potential CRC drug targets. We further prioritized these genes based on their relationships with CRC disease genes in the context of human protein-protein interaction networks. Thus, among the 113 potential drug targets, 15 were selected as the promising drug targets, including some genes that are supported by previous studies. Among them, EGFR, TOP1 and VEGFA are known targets of FDA-approved drugs. Additionally, CCND1 (cyclin D1), and PTGS2 (prostaglandin-endoperoxide synthase 2) have reported to be relevant to CRC or as potential drug targets based on the literature search. These results indicate that our approach is promising for drug target prediction for CRC treatment, which might be useful for other cancer therapeutics.
Published in 2015

A 'rule of 0.5' for the metabolite-likeness of approved pharmaceutical drugs.

Authors: O Hagan S, Swainston N, Handl J, Kell DB

Abstract: We exploit the recent availability of a community reconstruction of the human metabolic network ('Recon2') to study how close in structural terms are marketed drugs to the nearest known metabolite(s) that Recon2 contains. While other encodings using different kinds of chemical fingerprints give greater differences, we find using the 166 Public MDL Molecular Access (MACCS) keys that 90 % of marketed drugs have a Tanimoto similarity of more than 0.5 to the (structurally) 'nearest' human metabolite. This suggests a 'rule of 0.5' mnemonic for assessing the metabolite-like properties that characterise successful, marketed drugs. Multiobjective clustering leads to a similar conclusion, while artificial (synthetic) structures are seen to be less human-metabolite-like. This 'rule of 0.5' may have considerable predictive value in chemical biology and drug discovery, and may represent a powerful filter for decision making processes.
Published in 2015

Identifying prognostic features by bottom-up approach and correlating to drug repositioning.

Authors: Li W, Yu J, Lian B, Sun H, Li J, Zhang M, Li L, Li Y, Liu Q, Xie L

Abstract: BACKGROUND: Traditionally top-down method was used to identify prognostic features in cancer research. That is to say, differentially expressed genes usually in cancer versus normal were identified to see if they possess survival prediction power. The problem is that prognostic features identified from one set of patient samples can rarely be transferred to other datasets. We apply bottom-up approach in this study: survival correlated or clinical stage correlated genes were selected first and prioritized by their network topology additionally, then a small set of features can be used as a prognostic signature. METHODS: Gene expression profiles of a cohort of 221 hepatocellular carcinoma (HCC) patients were used as a training set, 'bottom-up' approach was applied to discover gene-expression signatures associated with survival in both tumor and adjacent non-tumor tissues, and compared with 'top-down' approach. The results were validated in a second cohort of 82 patients which was used as a testing set. RESULTS: Two sets of gene signatures separately identified in tumor and adjacent non-tumor tissues by bottom-up approach were developed in the training cohort. These two signatures were associated with overall survival times of HCC patients and the robustness of each was validated in the testing set, and each predictive performance was better than gene expression signatures reported previously. Moreover, genes in these two prognosis signature gave some indications for drug-repositioning on HCC. Some approved drugs targeting these markers have the alternative indications on hepatocellular carcinoma. CONCLUSION: Using the bottom-up approach, we have developed two prognostic gene signatures with a limited number of genes that associated with overall survival times of patients with HCC. Furthermore, prognostic markers in these two signatures have the potential to be therapeutic targets.
Published in 2015

An Ebola virus-centered knowledge base.

Authors: Kamdar MR, Dumontier M

Abstract: Ebola virus (EBOV), of the family Filoviridae viruses, is a NIAID category A, lethal human pathogen. It is responsible for causing Ebola virus disease (EVD) that is a severe hemorrhagic fever and has a cumulative death rate of 41% in the ongoing epidemic in West Africa. There is an ever-increasing need to consolidate and make available all the knowledge that we possess on EBOV, even if it is conflicting or incomplete. This would enable biomedical researchers to understand the molecular mechanisms underlying this disease and help develop tools for efficient diagnosis and effective treatment. In this article, we present our approach for the development of an Ebola virus-centered Knowledge Base (Ebola-KB) using Linked Data and Semantic Web Technologies. We retrieve and aggregate knowledge from several open data sources, web services and biomedical ontologies. This knowledge is transformed to RDF, linked to the Bio2RDF datasets and made available through a SPARQL 1.1 Endpoint. Ebola-KB can also be explored using an interactive Dashboard visualizing the different perspectives of this integrated knowledge. We showcase how different competency questions, asked by domain users researching the druggability of EBOV, can be formulated as SPARQL Queries or answered using the Ebola-KB Dashboard.
Published in 2015

A comparison of conditional random fields and structured support vector machines for chemical entity recognition in biomedical literature.

Authors: Tang B, Feng Y, Wang X, Wu Y, Zhang Y, Jiang M, Wang J, Xu H

Abstract: BACKGROUND: Chemical compounds and drugs (together called chemical entities) embedded in scientific articles are crucial for many information extraction tasks in the biomedical domain. However, only a very limited number of chemical entity recognition systems are publically available, probably due to the lack of large manually annotated corpora. To accelerate the development of chemical entity recognition systems, the Spanish National Cancer Research Center (CNIO) and The University of Navarra organized a challenge on Chemical and Drug Named Entity Recognition (CHEMDNER). The CHEMDNER challenge contains two individual subtasks: 1) Chemical Entity Mention recognition (CEM); and 2) Chemical Document Indexing (CDI). Our study proposes machine learning-based systems for the CEM task. METHODS: The 2013 CHEMDNER challenge organizers provided a manually annotated 10,000 UTF8-encoded PubMed abstracts according to a predefined annotation guideline: a training set of 3,500 abstracts, a development set of 3,500 abstracts and a test set of 3,000 abstracts. We developed machine learning-based systems, based on conditional random fields (CRF) and structured support vector machines (SSVM) respectively, for the CEM task for this data set. The effects of three types of word representation (WR) features, generated by Brown clustering, random indexing and skip-gram, on both two machine learning-based systems were also investigated. The performance of our system was evaluated on the test set using scripts provided by the CHEMDNER challenge organizers. Primary evaluation measures were micro Precision, Recall, and F-measure. RESULTS: Our best system was among the top ranked systems with an official micro F-measure of 85.05%. Fixing a bug caused by inconsistent features marginally improved the performance (micro F-measure of 85.20%) of the system. CONCLUSIONS: The SSVM-based CEM systems outperformed the CRF-based CEM systems when using the same features. Each type of the WR feature was beneficial to the CEM task. Both the CRF-based and SSVM-based systems using the all three types of WR features showed better performance than the systems using only one type of the WR feature.