Published in 2016

HITSZ_CDR: an end-to-end chemical and disease relation extraction system for BioCreative V.

Authors: Li H, Tang B, Chen Q, Chen K, Wang X, Wang B, Wang Z

In this article, an end-to-end system was proposed for the challenge task of disease named entity recognition (DNER) and chemical-induced disease (CID) relation extraction in BioCreative V, where DNER includes disease mention recognition (DMR) and normalization (DN). Evaluation on the challenge corpus showed that our system achieved the highest F1-scores 86.93% on DMR, 84.11% on DN, 43.04% on CID relation extraction, respectively. The F1-score on DMR is higher than our previous one reported by the challenge organizers (86.76%), the highest F1-score of the challenge.
Published in 2016

'RE:fine drugs': an interactive dashboard to access drug repurposing opportunities.

Authors: Moosavinasab S, Patterson J, Strouse R, Rastegar-Mojarad M, Regan K, Payne PR, Huang Y, Lin SM

The process of discovering new drugs has been extremely costly and slow in the last decades despite enormous investment in pharmaceutical research. Drug repurposing enables researchers to speed up the process of discovering other conditions that existing drugs can effectively treat, with low cost and fast FDA approval. Here, we introduce 'RE:fine Drugs', a freely available interactive website for integrated search and discovery of drug repurposing candidates from GWAS and PheWAS repurposing datasets constructed using previously reported methods in Nature Biotechnology. 'RE:fine Drugs' demonstrates the possibilities to identify and prioritize novelty of candidates for drug repurposing based on the theory of transitive Drug-Gene-Disease triads. This public website provides a starting point for research, industry, clinical and regulatory communities to accelerate the investigation and validation of new therapeutic use of old drugs.
Published in 2016

Drug-Drug Interaction Extraction via Convolutional Neural Networks.

Authors: Liu S, Tang B, Chen Q, Wang X

Abstract: Drug-drug interaction (DDI) extraction as a typical relation extraction task in natural language processing (NLP) has always attracted great attention. Most state-of-the-art DDI extraction systems are based on support vector machines (SVM) with a large number of manually defined features. Recently, convolutional neural networks (CNN), a robust machine learning method which almost does not need manually defined features, has exhibited great potential for many NLP tasks. It is worth employing CNN for DDI extraction, which has never been investigated. We proposed a CNN-based method for DDI extraction. Experiments conducted on the 2013 DDIExtraction challenge corpus demonstrate that CNN is a good choice for DDI extraction. The CNN-based DDI extraction method achieves an F-score of 69.75%, which outperforms the existing best performing method by 2.75%.
Published in 2016

A Survey of Computational Tools to Analyze and Interpret Whole Exome Sequencing Data.

Authors: Hintzsche JD, Robinson WA, Tan AC

Abstract: Whole Exome Sequencing (WES) is the application of the next-generation technology to determine the variations in the exome and is becoming a standard approach in studying genetic variants in diseases. Understanding the exomes of individuals at single base resolution allows the identification of actionable mutations for disease treatment and management. WES technologies have shifted the bottleneck in experimental data production to computationally intensive informatics-based data analysis. Novel computational tools and methods have been developed to analyze and interpret WES data. Here, we review some of the current tools that are being used to analyze WES data. These tools range from the alignment of raw sequencing reads all the way to linking variants to actionable therapeutics. Strengths and weaknesses of each tool are discussed for the purpose of helping researchers make more informative decisions on selecting the best tools to analyze their WES data.
Published in 2016

An Integrated Data Driven Approach to Drug Repositioning Using Gene-Disease Associations.

Authors: Mullen J, Cockell SJ, Woollard P, Wipat A

Abstract: Drug development is both increasing in cost whilst decreasing in productivity. There is a general acceptance that the current paradigm of R&D needs to change. One alternative approach is drug repositioning. With target-based approaches utilised heavily in the field of drug discovery, it becomes increasingly necessary to have a systematic method to rank gene-disease associations. Although methods already exist to collect, integrate and score these associations, they are often not a reliable reflection of expert knowledge. Furthermore, the amount of data available in all areas covered by bioinformatics is increasing dramatically year on year. It thus makes sense to move away from more generalised hypothesis driven approaches to research to one that allows data to generate their own hypothesis. We introduce an integrated, data driven approach to drug repositioning. We first apply a Bayesian statistics approach to rank 309,885 gene-disease associations using existing knowledge. Ranked associations are then integrated with other biological data to produce a semantically-rich drug discovery network. Using this network, we show how our approach identifies diseases of the central nervous system (CNS) to be an area of interest. CNS disorders are identified due to the low numbers of such disorders that currently have marketed treatments, in comparison to other therapeutic areas. We then systematically mine our network for semantic subgraphs that allow us to infer drug-disease relations that are not captured in the network. We identify and rank 275,934 drug-disease has_indication associations after filtering those that are more likely to be side effects, whilst commenting on the top ranked associations in more detail. The dataset has been created in Neo4j and is available for download at along with a Java implementation of the searching algorithm.
Published on December 27, 2016

Neuroblastoma, a Paradigm for Big Data Science in Pediatric Oncology.

Authors: Salazar BM, Balczewski EA, Ung CY, Zhu S

Abstract: Pediatric cancers rarely exhibit recurrent mutational events when compared to most adult cancers. This poses a challenge in understanding how cancers initiate, progress, and metastasize in early childhood. Also, due to limited detected driver mutations, it is difficult to benchmark key genes for drug development. In this review, we use neuroblastoma, a pediatric solid tumor of neural crest origin, as a paradigm for exploring "big data" applications in pediatric oncology. Computational strategies derived from big data science-network- and machine learning-based modeling and drug repositioning-hold the promise of shedding new light on the molecular mechanisms driving neuroblastoma pathogenesis and identifying potential therapeutics to combat this devastating disease. These strategies integrate robust data input, from genomic and transcriptomic studies, clinical data, and in vivo and in vitro experimental models specific to neuroblastoma and other types of cancers that closely mimic its biological characteristics. We discuss contexts in which "big data" and computational approaches, especially network-based modeling, may advance neuroblastoma research, describe currently available data and resources, and propose future models of strategic data collection and analyses for neuroblastoma and other related diseases.
Published on December 23, 2016

The extraction of drug-disease correlations based on module distance in incomplete human interactome.

Authors: Yu L, Wang B, Ma X, Gao L

Abstract: BACKGROUND: Extracting drug-disease correlations is crucial in unveiling disease mechanisms, as well as discovering new indications of available drugs, or drug repositioning. Both the interactome and the knowledge of disease-associated and drug-associated genes remain incomplete. RESULTS: We present a new method to predict the associations between drugs and diseases. Our method is based on a module distance, which is originally proposed to calculate distances between modules in incomplete human interactome. We first map all the disease genes and drug genes to a combined protein interaction network. Then based on the module distance, we calculate the distances between drug gene sets and disease gene sets, and take the distances as the relationships of drug-disease pairs. We also filter possible false positive drug-disease correlations by p-value. Finally, we validate the top-100 drug-disease associations related to six drugs in the predicted results. CONCLUSION: The overlapping between our predicted correlations with those reported in Comparative Toxicogenomics Database (CTD) and literatures, and their enriched Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways demonstrate our approach can not only effectively identify new drug indications, but also provide new insight into drug-disease discovery.
Published on December 23, 2016

Inferring new indications for approved drugs via random walk on drug-disease heterogenous networks.

Authors: Liu H, Song Y, Guan J, Luo L, Zhuang Z

Abstract: BACKGROUND: Since traditional drug research and development is often time-consuming and high-risk, there is an increasing interest in establishing new medical indications for approved drugs, referred to as drug repositioning, which provides a relatively low-cost and high-efficiency approach for drug discovery. With the explosive growth of large-scale biochemical and phenotypic data, drug repositioning holds great potential for precision medicine in the post-genomic era. It is urgent to develop rational and systematic approaches to predict new indications for approved drugs on a large scale. RESULTS: In this paper, we propose the two-pass random walks with restart on a heterogenous network, TP-NRWRH for short, to predict new indications for approved drugs. Rather than random walk on bipartite network, we integrated the drug-drug similarity network, disease-disease similarity network and known drug-disease association network into one heterogenous network, on which the two-pass random walks with restart is implemented. We have conducted performance evaluation on two datasets of drug-disease associations, and the results show that our method has higher performance than six existing methods. A case study on the Alzheimer's disease showed that nine of top 10 predicted drugs have been approved or investigational for neurodegenerative diseases. The experimental results show that our method achieves state-of-the-art performance in predicting new indications for approved drugs. CONCLUSIONS: We proposed a two-pass random walk with restart on the drug-disease heterogeneous network, referred to as TP-NRWRH, to predict new indications for approved drugs. Performance evaluation on two independent datasets showed that TP-NRWRH achieved higher performance than six existing methods on 10-fold cross validations. The case study on the Alzheimer's disease showed that nine of top 10 predicted drugs have been approved or are investigational for neurodegenerative diseases. The results show that our method achieves state-of-the-art performance in predicting new indications for approved drugs.
Published on December 22, 2016

Drug-target interaction prediction via class imbalance-aware ensemble learning.

Authors: Ezzat A, Wu M, Li XL, Kwoh CK

Abstract: BACKGROUND: Multiple computational methods for predicting drug-target interactions have been developed to facilitate the drug discovery process. These methods use available data on known drug-target interactions to train classifiers with the purpose of predicting new undiscovered interactions. However, a key challenge regarding this data that has not yet been addressed by these methods, namely class imbalance, is potentially degrading the prediction performance. Class imbalance can be divided into two sub-problems. Firstly, the number of known interacting drug-target pairs is much smaller than that of non-interacting drug-target pairs. This imbalance ratio between interacting and non-interacting drug-target pairs is referred to as the between-class imbalance. Between-class imbalance degrades prediction performance due to the bias in prediction results towards the majority class (i.e. the non-interacting pairs), leading to more prediction errors in the minority class (i.e. the interacting pairs). Secondly, there are multiple types of drug-target interactions in the data with some types having relatively fewer members (or are less represented) than others. This variation in representation of the different interaction types leads to another kind of imbalance referred to as the within-class imbalance. In within-class imbalance, prediction results are biased towards the better represented interaction types, leading to more prediction errors in the less represented interaction types. RESULTS: We propose an ensemble learning method that incorporates techniques to address the issues of between-class imbalance and within-class imbalance. Experiments show that the proposed method improves results over 4 state-of-the-art methods. In addition, we simulated cases for new drugs and targets to see how our method would perform in predicting their interactions. New drugs and targets are those for which no prior interactions are known. Our method displayed satisfactory prediction performance and was able to predict many of the interactions successfully. CONCLUSIONS: Our proposed method has improved the prediction performance over the existing work, thus proving the importance of addressing problems pertaining to class imbalance in the data.
Published on December 13, 2016

Differences in reproductive toxicology between alopecia drugs: an analysis on adverse events among female and male cases.

Authors: Wu M, Yu Q, Li Q

Abstract: Alopecia is a dermatological condition with limited therapeutic options. Only two drugs, finasteride and minoxidil, are approved by FDA for alopecia treatment. However, little is known about the differences in adverse effects between these two drugs. We examined the clinical reports submitted to the FDA Adverse Event Reporting System (FAERS) from 2004 to 2014. For both female and males, finasteride was found to be more associated with reproductive toxicity as compared to minoxidil. Among male alopecia cases, finasteride was significantly more concurrent with several forms of sexual dysfunction. Among female alopecia cases, finasteride was significantly more concurrent with harm to fetus and disorder of uterus. In addition, drug-gene network analysis indicated that finasteride could profoundly disturb pathways related to sex hormone signaling and oocyte maturation. These findings could provide clues for subsequent toxicological research. Taken together, this analysis suggested that finasteride could be more liable to various reproductive adverse effects. Some of these adverse effects have yet to be warned in FDA-approved drug label. This information can help improve the treatment regimen of alopecia and post-marketing regulation of drug products.