Published on August 15, 2014

SPARQLGraph: a web-based platform for graphically querying biological Semantic Web databases.

Authors: Schweiger D, Trajanoski Z, Pabinger S

Abstract: BACKGROUND: Semantic Web has established itself as a framework for using and sharing data across applications and database boundaries. Here, we present a web-based platform for querying biological Semantic Web databases in a graphical way. RESULTS: SPARQLGraph offers an intuitive drag & drop query builder, which converts the visual graph into a query and executes it on a public endpoint. The tool integrates several publicly available Semantic Web databases, including the databases of the just recently released EBI RDF platform. Furthermore, it provides several predefined template queries for answering biological questions. Users can easily create and save new query graphs, which can also be shared with other researchers. CONCLUSIONS: This new graphical way of creating queries for biological Semantic Web databases considerably facilitates usability as it removes the requirement of knowing specific query languages and database structures. The system is freely available at
Published on August 8, 2014

A phenome-guided drug repositioning through a latent variable model.

Authors: Bisgin H, Liu Z, Fang H, Kelly R, Xu X, Tong W

Abstract: BACKGROUND: The phenome represents a distinct set of information in the human population. It has been explored particularly in its relationship with the genome to identify correlations for diseases. The phenome has been also explored for drug repositioning with efforts focusing on the search space for the most similar candidate drugs. For a comprehensive analysis of the phenome, we assumed that all phenotypes (indications and side effects) were inter-connected with a probabilistic distribution and this characteristic may offer an opportunity to identify new therapeutic indications for a given drug. Correspondingly, we employed Latent Dirichlet Allocation (LDA), which introduces latent variables (topics) to govern the phenome distribution. RESULTS: We developed our model on the phenome information in Side Effect Resource (SIDER). We first developed a LDA model optimized based on its recovery potential through perturbing the drug-phenotype matrix for each of the drug-indication pairs where each drug-indication relationship was switched to "unknown" one at the time and then recovered based on the remaining drug-phenotype pairs. Of the probabilistically significant pairs, 70% was successfully recovered. Next, we applied the model on the whole phenome to narrow down repositioning candidates and suggest alternative indications. We were able to retrieve approved indications of 6 drugs whose indications were not listed in SIDER. For 908 drugs that were present with their indication information, our model suggested alternative treatment options for further investigations. Several of the suggested new uses can be supported with information from the scientific literature. CONCLUSIONS: The results demonstrated that the phenome can be further analyzed by a generative model, which can discover probabilistic associations between drugs and therapeutic uses. In this regard, LDA serves as an enrichment tool to explore new uses of existing drugs by narrowing down the search space.
Published in July 2014

Immune-related chemotactic factors were found in acute coronary syndromes by bioinformatics.

Authors: Zhang L, Li J, Liang A, Liu Y, Deng B, Wang H

Abstract: DNA microarray data for thrombus-related leukocyte from patients with acute coronary syndrome (ACS) was analyzed to acquire key genes associated with ACS. Microarray data set GSE19339, including four ACS patients' samples and four normal samples, were downloaded from Gene Expression Omnibus database. Raw data was pre-processed and differentially expressed genes (DEGs) were identified by Affy packages of R. The interaction network was established with STRING. DrugBank was retrieved to obtain relevant small molecules. A total of 487 differentially expressed genes were identified as DEGs between normal and disease samples. Among which, ten up-regulated genes belonging to chemokine family (CCL2, CCR1, CXCL3, CXCL2, CCL8, CXCL11, CCL7, IL10, CCL22 and CCL20) were related to inflammatory response. In addition, two inhibitors of CCL2 (L-Mimosine) were retrieved from the DrugBank database. Considering the roles of inflammatory response in the progression of ACS and the functions of the ten up-regulated genes, we speculated that these genes might be related to ACS. Moreover, the inhibitors could provide guidelines for future drug design acting on these genes.
Published in July 2014

Whipworm genome and dual-species transcriptome analyses provide molecular insights into an intimate host-parasite interaction.

Authors: Foth BJ, Tsai IJ, Reid AJ, Bancroft AJ, Nichol S, Tracey A, Holroyd N, Cotton JA, Stanley EJ, Zarowiecki M, Liu JZ, Huckvale T, Cooper PJ, Grencis RK, Berriman M

Abstract: Whipworms are common soil-transmitted helminths that cause debilitating chronic infections in man. These nematodes are only distantly related to Caenorhabditis elegans and have evolved to occupy an unusual niche, tunneling through epithelial cells of the large intestine. We report here the whole-genome sequences of the human-infective Trichuris trichiura and the mouse laboratory model Trichuris muris. On the basis of whole-transcriptome analyses, we identify many genes that are expressed in a sex- or life stage-specific manner and characterize the transcriptional landscape of a morphological region with unique biological adaptations, namely, bacillary band and stichosome, found only in whipworms and related parasites. Using RNA sequencing data from whipworm-infected mice, we describe the regulated T helper 1 (TH1)-like immune response of the chronically infected cecum in unprecedented detail. In silico screening identified numerous new potential drug targets against trichuriasis. Together, these genomes and associated functional data elucidate key aspects of the molecular host-parasite interactions that define chronic whipworm infection.
Published in July 2014

Many approved drugs have bioactive analogs with different target annotations.

Authors: Hu Y, Lounkine E, Bajorath J

Abstract: Close structural relationships between approved drugs and bioactive compounds were systematically assessed using matched molecular pairs. For structural analogs of drugs, target information was assembled from ChEMBL and compared to drug targets reported in DrugBank. For many drugs, multiple analogs were identified that were active against different targets. Some of these additional targets were closely related to known drug targets while others were not. Surprising discrepancies between reported drug targets and targets of close structural analogs were often observed. On one hand, the results suggest that hypotheses concerning alternative drug targets can often be formulated on the basis of close structural relationships to bioactive compounds that are easily detectable. It is conceivable that such obvious structure-target relationships are frequently not considered (or might be overlooked) when compounds are developed with a focus on a primary target and a few related (or undesired) ones. On the other hand, our findings also raise questions concerning database content and drug repositioning efforts.
Published in July 2014

DINIES: drug-target interaction network inference engine based on supervised analysis.

Authors: Yamanishi Y, Kotera M, Moriya Y, Sawada R, Kanehisa M, Goto S

Abstract: DINIES (drug-target interaction network inference engine based on supervised analysis) is a web server for predicting unknown drug-target interaction networks from various types of biological data (e.g. chemical structures, drug side effects, amino acid sequences and protein domains) in the framework of supervised network inference. The originality of DINIES lies in prediction with state-of-the-art machine learning methods, in the integration of heterogeneous biological data and in compatibility with the KEGG database. The DINIES server accepts any 'profiles' or precalculated similarity matrices (or 'kernels') of drugs and target proteins in tab-delimited file format. When a training data set is submitted to learn a predictive model, users can select either known interaction information in the KEGG DRUG database or their own interaction data. The user can also select an algorithm for supervised network inference, select various parameters in the method and specify weights for heterogeneous data integration. The server can provide integrative analyses with useful components in KEGG, such as biological pathways, functional hierarchy and human diseases. DINIES ( is publicly available as one of the genome analysis tools in GenomeNet.
Published in July 2014

Robustness and evolvability of the human signaling network.

Authors: Kim J, Vandamme D, Kim JR, Munoz AG, Kolch W, Cho KH

Abstract: Biological systems are known to be both robust and evolvable to internal and external perturbations, but what causes these apparently contradictory properties? We used Boolean network modeling and attractor landscape analysis to investigate the evolvability and robustness of the human signaling network. Our results show that the human signaling network can be divided into an evolvable core where perturbations change the attractor landscape in state space, and a robust neighbor where perturbations have no effect on the attractor landscape. Using chemical inhibition and overexpression of nodes, we validated that perturbations affect the evolvable core more strongly than the robust neighbor. We also found that the evolvable core has a distinct network structure, which is enriched in feedback loops, and features a higher degree of scale-freeness and longer path lengths connecting the nodes. In addition, the genes with high evolvability scores are associated with evolvability-related properties such as rapid evolvability, low species broadness, and immunity whereas the genes with high robustness scores are associated with robustness-related properties such as slow evolvability, high species broadness, and oncogenes. Intriguingly, US Food and Drug Administration-approved drug targets have high evolvability scores whereas experimental drug targets have high robustness scores.
Published in July 2014

FreeSolv: a database of experimental and calculated hydration free energies, with input files.

Authors: Mobley DL, Guthrie JP

Abstract: This work provides a curated database of experimental and calculated hydration free energies for small neutral molecules in water, along with molecular structures, input files, references, and annotations. We call this the Free Solvation Database, or FreeSolv. Experimental values were taken from prior literature and will continue to be curated, with updated experimental references and data added as they become available. Calculated values are based on alchemical free energy calculations using molecular dynamics simulations. These used the GAFF small molecule force field in TIP3P water with AM1-BCC charges. Values were calculated with the GROMACS simulation package, with full details given in references cited within the database itself. This database builds in part on a previous, 504-molecule database containing similar information. However, additional curation of both experimental data and calculated values has been done here, and the total number of molecules is now up to 643. Additional information is now included in the database, such as SMILES strings, PubChem compound IDs, accurate reference DOIs, and others. One version of the database is provided in the Supporting Information of this article, but as ongoing updates are envisioned, the database is now versioned and hosted online. In addition to providing the database, this work describes its construction process. The database is available free-of-charge via .
Published in July 2014

Alkemio: association of chemicals with biomedical topics by text and data mining.

Authors: Gijon-Correas JA, Andrade-Navarro MA, Fontaine JF

Abstract: UNLABELLED: The PubMed(R) database of biomedical citations allows the retrieval of scientific articles studying the function of chemicals in biology and medicine. Mining millions of available citations to search reported associations between chemicals and topics of interest would require substantial human time. We have implemented the Alkemio text mining web tool and SOAP web service to help in this task. The tool uses biomedical articles discussing chemicals (including drugs), predicts their relatedness to the query topic with a naive Bayesian classifier and ranks all chemicals by P-values computed from random simulations. Benchmarks on seven human pathways showed good retrieval performance (areas under the receiver operating characteristic curves ranged from 73.6 to 94.5%). Comparison with existing tools to retrieve chemicals associated to eight diseases showed the higher precision and recall of Alkemio when considering the top 10 candidate chemicals. Alkemio is a high performing web tool ranking chemicals for any biomedical topics and it is free to non-commercial users. AVAILABILITY: approximately medlineranker/cms/alkemio.
Published in July 2014

DiseaseConnect: a comprehensive web server for mechanism-based disease-disease connections.

Authors: Liu CC, Tseng YT, Li W, Wu CY, Mayzus I, Rzhetsky A, Sun F, Waterman M, Chen JJ, Chaudhary PM, Loscalzo J, Crandall E, Zhou XJ

Abstract: The DiseaseConnect ( is a web server for analysis and visualization of a comprehensive knowledge on mechanism-based disease connectivity. The traditional disease classification system groups diseases with similar clinical symptoms and phenotypic traits. Thus, diseases with entirely different pathologies could be grouped together, leading to a similar treatment design. Such problems could be avoided if diseases were classified based on their molecular mechanisms. Connecting diseases with similar pathological mechanisms could inspire novel strategies on the effective repositioning of existing drugs and therapies. Although there have been several studies attempting to generate disease connectivity networks, they have not yet utilized the enormous and rapidly growing public repositories of disease-related omics data and literature, two primary resources capable of providing insights into disease connections at an unprecedented level of detail. Our DiseaseConnect, the first public web server, integrates comprehensive omics and literature data, including a large amount of gene expression data, Genome-Wide Association Studies catalog, and text-mined knowledge, to discover disease-disease connectivity via common molecular mechanisms. Moreover, the clinical comorbidity data and a comprehensive compilation of known drug-disease relationships are additionally utilized for advancing the understanding of the disease landscape and for facilitating the mechanism-based development of new drug treatments.