Publications Search
Explore how scientists all over the world use DrugBank in their research.
Published in January 2015
READ PUBLICATION →

Functional evaluation of out-of-the-box text-mining tools for data-mining tasks.

Authors: Jung K, LePendu P, Iyer S, Bauer-Mehren A, Percha B, Shah NH

Abstract: OBJECTIVE: The trade-off between the speed and simplicity of dictionary-based term recognition and the richer linguistic information provided by more advanced natural language processing (NLP) is an area of active discussion in clinical informatics. In this paper, we quantify this trade-off among text processing systems that make different trade-offs between speed and linguistic understanding. We tested both types of systems in three clinical research tasks: phase IV safety profiling of a drug, learning adverse drug-drug interactions, and learning used-to-treat relationships between drugs and indications. MATERIALS: We first benchmarked the accuracy of the NCBO Annotator and REVEAL in a manually annotated, publically available dataset from the 2008 i2b2 Obesity Challenge. We then applied the NCBO Annotator and REVEAL to 9 million clinical notes from the Stanford Translational Research Integrated Database Environment (STRIDE) and used the resulting data for three research tasks. RESULTS: There is no significant difference between using the NCBO Annotator and REVEAL in the results of the three research tasks when using large datasets. In one subtask, REVEAL achieved higher sensitivity with smaller datasets. CONCLUSIONS: For a variety of tasks, employing simple term recognition methods instead of advanced NLP methods results in little or no impact on accuracy when using large datasets. Simpler dictionary-based methods have the advantage of scaling well to very large datasets. Promoting the use of simple, dictionary-based methods for population level analyses can advance adoption of NLP in practice.
Published in January 2015
READ PUBLICATION →

The neXtProt knowledgebase on human proteins: current status.

Authors: Gaudet P, Michel PA, Zahn-Zabal M, Cusin I, Duek PD, Evalet O, Gateau A, Gleizes A, Pereira M, Teixeira D, Zhang Y, Lane L, Bairoch A

Abstract: neXtProt (http://www.nextprot.org) is a human protein-centric knowledgebase developed at the SIB Swiss Institute of Bioinformatics. Focused solely on human proteins, neXtProt aims to provide a state of the art resource for the representation of human biology by capturing a wide range of data, precise annotations, fully traceable data provenance and a web interface which enables researchers to find and view information in a comprehensive manner. Since the introductory neXtProt publication, significant advances have been made on three main aspects: the representation of proteomics data, an extended representation of human variants and the development of an advanced search capability built around semantic technologies. These changes are presented in the current neXtProt update.
Published in January 2015
READ PUBLICATION →

Helminth.net: expansions to Nematode.net and an introduction to Trematode.net.

Authors: Martin J, Rosa BA, Ozersky P, Hallsworth-Pepin K, Zhang X, Bhonagiri-Palsikar V, Tyagi R, Wang Q, Choi YJ, Gao X, McNulty SN, Brindley PJ, Mitreva M

Abstract: Helminth.net (http://www.helminth.net) is the new moniker for a collection of databases: Nematode.net and Trematode.net. Within this collection we provide services and resources for parasitic roundworms (nematodes) and flatworms (trematodes), collectively known as helminths. For over a decade we have provided resources for studying nematodes via our veteran site Nematode.net (http://nematode.net). In this article, (i) we provide an update on the expansions of Nematode.net that hosts omics data from 84 species and provides advanced search tools to the broad scientific community so that data can be mined in a useful and user-friendly manner and (ii) we introduce Trematode.net, a site dedicated to the dissemination of data from flukes, flatworm parasites of the class Trematoda, phylum Platyhelminthes. Trematode.net is an independent component of Helminth.net and currently hosts data from 16 species, with information ranging from genomic, functional genomic data, enzymatic pathway utilization to microbiome changes associated with helminth infections. The databases' interface, with a sophisticated query engine as a backbone, is intended to allow users to search for multi-factorial combinations of species' omics properties. This report describes updates to Nematode.net since its last description in NAR, 2012, and also introduces and presents its new sibling site, Trematode.net.
Published in January 2015
READ PUBLICATION →

Evolutionary constraint and disease associations of post-translational modification sites in human genomes.

Authors: Reimand J, Wagih O, Bader GD

Abstract: Interpreting the impact of human genome variation on phenotype is challenging. The functional effect of protein-coding variants is often predicted using sequence conservation and population frequency data, however other factors are likely relevant. We hypothesized that variants in protein post-translational modification (PTM) sites contribute to phenotype variation and disease. We analyzed fraction of rare variants and non-synonymous to synonymous variant ratio (Ka/Ks) in 7,500 human genomes and found a significant negative selection signal in PTM regions independent of six factors, including conservation, codon usage, and GC-content, that is widely distributed across tissue-specific genes and function classes. PTM regions are also enriched in known disease mutations, suggesting that PTM variation is more likely deleterious. PTM constraint also affects flanking sequence around modified residues and increases around clustered sites, indicating presence of functionally important short linear motifs. Using target site motifs of 124 kinases, we predict that at least approximately 180,000 motif-breaker amino acid residues that disrupt PTM sites when substituted, and highlight kinase motifs that show specific negative selection and enrichment of disease mutations. We provide this dataset with corresponding hypothesized mechanisms as a community resource. As an example of our integrative approach, we propose that PTPN11 variants in Noonan syndrome aberrantly activate the protein by disrupting an uncharacterized cluster of phosphorylation sites. Further, as PTMs are molecular switches that are modulated by drugs, we study mutated binding sites of PTM enzymes in disease genes and define a drug-disease network containing 413 novel predicted disease-gene links.
Published in January 2015
READ PUBLICATION →

EHFPI: a database and analysis resource of essential host factors for pathogenic infection.

Authors: Liu Y, Xie D, Han L, Bai H, Li F, Wang S, Bo X

Abstract: High-throughput screening and computational technology has greatly changed the face of microbiology in better understanding pathogen-host interactions. Genome-wide RNA interference (RNAi) screens have given rise to a new class of host genes designated as Essential Host Factors (EHFs), whose knockdown effects significantly influence pathogenic infections. Therefore, we present the first release of a manually-curated bioinformatics database and analysis resource EHFPI (Essential Host Factors for Pathogenic Infection, http://biotech.bmi.ac.cn/ehfpi). EHFPI captures detailed article, screen, pathogen and phenotype annotation information for a total of 4634 EHF genes of 25 clinically important pathogenic species. Notably, EHFPI also provides six powerful and data-integrative analysis tools, i.e. EHF Overlap Analysis, EHF-pathogen Network Analysis, Gene Enrichment Analysis, Pathogen Interacting Proteins (PIPs) Analysis, Drug Target Analysis and GWAS Candidate Gene Analysis, which advance the comprehensive understanding of the biological roles of EHF genes, as in diverse perspectives of protein-protein interaction network, drug targets and diseases/traits. The EHFPI web interface provides appropriate tools that allow efficient query of EHF data and visualization of custom-made analysis results. EHFPI data and tools shall keep available without charge and serve the microbiology, biomedicine and pharmaceutics research communities, to finally facilitate the development of diagnostics, prophylactics and therapeutics for human pathogens.
Published in January 2015
READ PUBLICATION →

White adipose tissue reference network: a knowledge resource for exploring health-relevant relations.

Authors: Kelder T, Summer G, Caspers M, van Schothorst EM, Keijer J, Duivenvoorde L, Klaus S, Voigt A, Bohnert L, Pico C, Palou A, Bonet ML, Dembinska-Kiec A, Malczewska-Malec M, Kiec-Wilk B, Del Bas JM, Caimari A, Arola L, van Erk M, van Ommen B, Radonjic M

Abstract: Optimal health is maintained by interaction of multiple intrinsic and environmental factors at different levels of complexity-from molecular, to physiological, to social. Understanding and quantification of these interactions will aid design of successful health interventions. We introduce the reference network concept as a platform for multi-level exploration of biological relations relevant for metabolic health, by integration and mining of biological interactions derived from public resources and context-specific experimental data. A White Adipose Tissue Health Reference Network (WATRefNet) was constructed as a resource for discovery and prioritization of mechanism-based biomarkers for white adipose tissue (WAT) health status and the effect of food and drug compounds on WAT health status. The WATRefNet (6,797 nodes and 32,171 edges) is based on (1) experimental data obtained from 10 studies addressing different adiposity states, (2) seven public knowledge bases of molecular interactions, (3) expert's definitions of five physiologically relevant processes key to WAT health, namely WAT expandability, Oxidative capacity, Metabolic state, Oxidative stress and Tissue inflammation, and (4) a collection of relevant biomarkers of these processes identified by BIOCLAIMS ( http://bioclaims.uib.es ). The WATRefNet comprehends multiple layers of biological complexity as it contains various types of nodes and edges that represent different biological levels and interactions. We have validated the reference network by showing overrepresentation with anti-obesity drug targets, pathology-associated genes and differentially expressed genes from an external disease model dataset. The resulting network has been used to extract subnetworks specific to the above-mentioned expert-defined physiological processes. Each of these process-specific signatures represents a mechanistically supported composite biomarker for assessing and quantifying the effect of interventions on a physiological aspect that determines WAT health status. Following this principle, five anti-diabetic drug interventions and one diet intervention were scored for the match of their expression signature to the five biomarker signatures derived from the WATRefNet. This confirmed previous observations of successful intervention by dietary lifestyle and revealed WAT-specific effects of drug interventions. The WATRefNet represents a sustainable knowledge resource for extraction of relevant relationships such as mechanisms of action, nutrient intervention targets and biomarkers and for assessment of health effects for support of health claims made on food products.
Published in January 2015
READ PUBLICATION →

The semantic web in translational medicine: current applications and future directions.

Authors: Machado CM, Rebholz-Schuhmann D, Freitas AT, Couto FM

Abstract: Semantic web technologies offer an approach to data integration and sharing, even for resources developed independently or broadly distributed across the web. This approach is particularly suitable for scientific domains that profit from large amounts of data that reside in the public domain and that have to be exploited in combination. Translational medicine is such a domain, which in addition has to integrate private data from the clinical domain with proprietary data from the pharmaceutical domain. In this survey, we present the results of our analysis of translational medicine solutions that follow a semantic web approach. We assessed these solutions in terms of their target medical use case; the resources covered to achieve their objectives; and their use of existing semantic web resources for the purposes of data sharing, data interoperability and knowledge discovery. The semantic web technologies seem to fulfill their role in facilitating the integration and exploration of data from disparate sources, but it is also clear that simply using them is not enough. It is fundamental to reuse resources, to define mappings between resources, to share data and knowledge. All these aspects allow the instantiation of translational medicine at the semantic web-scale, thus resulting in a network of solutions that can share resources for a faster transfer of new scientific results into the clinical practice. The envisioned network of translational medicine solutions is on its way, but it still requires resolving the challenges of sharing protected data and of integrating semantic-driven technologies into the clinical practice.
Published in January 2015
READ PUBLICATION →

Continuous production of fenofibrate solid lipid nanoparticles by hot-melt extrusion technology: a systematic study based on a quality by design approach.

Authors: Patil H, Feng X, Ye X, Majumdar S, Repka MA

Abstract: This contribution describes a continuous process for the production of solid lipid nanoparticles (SLN) as drug-carrier systems via hot-melt extrusion (HME). Presently, HME technology has not been used for the manufacturing of SLN. Generally, SLN are prepared as a batch process, which is time consuming and may result in variability of end-product quality attributes. In this study, using Quality by Design (QbD) principles, we were able to achieve continuous production of SLN by combining two processes: HME technology for melt-emulsification and high-pressure homogenization (HPH) for size reduction. Fenofibrate (FBT), a poorly water-soluble model drug, was incorporated into SLN using HME-HPH methods. The developed novel platform demonstrated better process control and size reduction compared to the conventional process of hot homogenization (batch process). Varying the process parameters enabled the production of SLN below 200 nm. The dissolution profile of the FBT SLN prepared by the novel HME-HPH method was faster than that of the crude FBT and a micronized marketed FBT formulation. At the end of a 5-h in vitro dissolution study, a SLN formulation released 92-93% of drug, whereas drug release was approximately 65 and 45% for the marketed micronized formulation and crude drug, respectively. Also, pharmacokinetic study results demonstrated a statistical increase in Cmax, Tmax, and AUC0-24 h in the rate of drug absorption from SLN formulations as compared to the crude drug and marketed micronized formulation. In summary, the present study demonstrated the potential use of hot-melt extrusion technology for continuous and large-scale production of SLN.
Published in January 2015
READ PUBLICATION →

CFam: a chemical families database based on iterative selection of functional seeds and seed-directed compound clustering.

Authors: Zhang C, Tao L, Qin C, Zhang P, Chen S, Zeng X, Xu F, Chen Z, Yang SY, Chen YZ

Abstract: Similarity-based clustering and classification of compounds enable the search of drug leads and the structural and chemogenomic studies for facilitating chemical, biomedical, agricultural, material and other industrial applications. A database that organizes compounds into similarity-based as well as scaffold-based and property-based families is useful for facilitating these tasks. CFam Chemical Family database http://bidd2.cse.nus.edu.sg/cfam was developed to hierarchically cluster drugs, bioactive molecules, human metabolites, natural products, patented agents and other molecules into functional families, superfamilies and classes of structurally similar compounds based on the literature-reported high, intermediate and remote similarity measures. The compounds were represented by molecular fingerprint and molecular similarity was measured by Tanimoto coefficient. The functional seeds of CFam families were from hierarchically clustered drugs, bioactive molecules, human metabolites, natural products, patented agents, respectively, which were used to characterize families and cluster compounds into families, superfamilies and classes. CFam currently contains 11,643 classes, 34,880 superfamilies and 87,136 families of 490,279 compounds (1691 approved drugs, 1228 clinical trial drugs, 12,386 investigative drugs, 262,881 highly active molecules, 15,055 human metabolites, 80,255 ZINC-processed natural products and 116,783 patented agents). Efforts will be made to further expand CFam database and add more functional categories and families based on other types of molecular representations.
Published in January 2015
READ PUBLICATION →

ValidatorDB: database of up-to-date validation results for ligands and non-standard residues from the Protein Data Bank.

Authors: Sehnal D, Svobodova Varekova R, Pravda L, Ionescu CM, Geidl S, Horsky V, Jaiswal D, Wimmerova M, Koca J

Abstract: Following the discovery of serious errors in the structure of biomacromolecules, structure validation has become a key topic of research, especially for ligands and non-standard residues. ValidatorDB (freely available at http://ncbr.muni.cz/ValidatorDB) offers a new step in this direction, in the form of a database of validation results for all ligands and non-standard residues from the Protein Data Bank (all molecules with seven or more heavy atoms). Model molecules from the wwPDB Chemical Component Dictionary are used as reference during validation. ValidatorDB covers the main aspects of validation of annotation, and additionally introduces several useful validation analyses. The most significant is the classification of chirality errors, allowing the user to distinguish between serious issues and minor inconsistencies. Other such analyses are able to report, for example, completely erroneous ligands, alternate conformations or complete identity with the model molecules. All results are systematically classified into categories, and statistical evaluations are performed. In addition to detailed validation reports for each molecule, ValidatorDB provides summaries of the validation results for the entire PDB, for sets of molecules sharing the same annotation (three-letter code) or the same PDB entry, and for user-defined selections of annotations or PDB entries.