Publications Search
Explore how scientists all over the world use DrugBank in their research.
Published in August 2009
READ PUBLICATION →

Tunable machine vision-based strategy for automated annotation of chemical databases.

Authors: Park J, Rosania GR, Saitou K

Abstract: We present a tunable, machine vision-based strategy for automated annotation of virtual small molecule databases. The proposed strategy is based on the use of a machine vision-based tool for extracting structure diagrams in research articles and converting them into connection tables, a virtual "Chemical Expert" system for screening the converted structures based on the adjustable levels of estimated conversion accuracy, and a fragment-based measure for calculating intermolecular similarity. For annotation, calculated chemical similarity between the converted structures and entries in a virtual small molecule database is used to establish the links. The overall annotation performances can be tuned by adjusting the cutoff threshold of the estimated conversion accuracy. We perform an annotation test which attempts to link 121 journal articles registered in PubMed to entries in PubChem which is the largest, publicly accessible chemical database. Two cases of tests are performed, and their results are compared to see how the overall annotation performances are affected by the different threshold levels of the estimated accuracy of the converted structure. Our work demonstrates that over 45% of the articles could have true positive links to entries in the PubChem database with promising recall and precision rates in both tests. Furthermore, we illustrate that the Chemical Expert system which can screen converted structures based on the adjustable levels of estimated conversion accuracy is a key factor impacting the overall annotation performance. We propose that this machine vision-based strategy can be incorporated with the text-mining approach to facilitate extraction of contextual scientific knowledge about a chemical structure, from the scientific literature.
Published in July 2009
READ PUBLICATION →

Harvesting candidate genes responsible for serious adverse drug reactions from a chemical-protein interactome.

Authors: Yang L, Chen J, He L

Abstract: Identifying genetic factors responsible for serious adverse drug reaction (SADR) is of critical importance to personalized medicine. However, genome-wide association studies are hampered due to the lack of case-control samples, and the selection of candidate genes is limited by the lack of understanding of the underlying mechanisms of SADRs. We hypothesize that drugs causing the same type of SADR might share a common mechanism by targeting unexpectedly the same SADR-mediating protein. Hence we propose an approach of identifying the common SADR-targets through constructing and mining an in silico chemical-protein interactome (CPI), a matrix of binding strengths among 162 drug molecules known to cause at least one type of SADR and 845 proteins. Drugs sharing the same SADR outcome were also found to possess similarities in their CPI profiles towards this 845 protein set. This methodology identified the candidate gene of sulfonamide-induced toxic epidermal necrolysis (TEN): all nine sulfonamides that cause TEN were found to bind strongly to MHC I (Cw*4), whereas none of the 17 control drugs that do not cause TEN were found to bind to it. Through an insight into the CPI, we found the Y116S substitution of MHC I (B*5703) enhances the unexpected binding of abacavir to its antigen presentation groove, which explains why B*5701, not B*5703, is the risk allele of abacavir-induced hypersensitivity. In conclusion, SADR targets and the patient-specific off-targets could be identified through a systematic investigation of the CPI, generating important hypotheses for prospective experimental validation of the candidate genes.
Published in July 2009
READ PUBLICATION →

Building disease-specific drug-protein connectivity maps from molecular interaction networks and PubMed abstracts.

Authors: Li J, Zhu X, Chen JY

Abstract: The recently proposed concept of molecular connectivity maps enables researchers to integrate experimental measurements of genes, proteins, metabolites, and drug compounds under similar biological conditions. The study of these maps provides opportunities for future toxicogenomics and drug discovery applications. We developed a computational framework to build disease-specific drug-protein connectivity maps. We integrated gene/protein and drug connectivity information based on protein interaction networks and literature mining, without requiring gene expression profile information derived from drug perturbation experiments on disease samples. We described the development and application of this computational framework using Alzheimer's Disease (AD) as a primary example in three steps. First, molecular interaction networks were incorporated to reduce bias and improve relevance of AD seed proteins. Second, PubMed abstracts were used to retrieve enriched drug terms that are indirectly associated with AD through molecular mechanistic studies. Third and lastly, a comprehensive AD connectivity map was created by relating enriched drugs and related proteins in literature. We showed that this molecular connectivity map development approach outperformed both curated drug target databases and conventional information retrieval systems. Our initial explorations of the AD connectivity map yielded a new hypothesis that diltiazem and quinidine may be investigated as candidate drugs for AD treatment. Molecular connectivity maps derived computationally can help study molecular signature differences between different classes of drugs in specific disease contexts. To achieve overall good data coverage and quality, a series of statistical methods have been developed to overcome high levels of data noise in biological networks and literature mining results. Further development of computational molecular connectivity maps to cover major disease areas will likely set up a new model for drug development, in which therapeutic/toxicological profiles of candidate drugs can be checked computationally before costly clinical trials begin.
Published on July 30, 2009
READ PUBLICATION →

Discovery: an interactive resource for the rational selection and comparison of putative drug target proteins in malaria.

Authors: Joubert F, Harrison CM, Koegelenberg RJ, Odendaal CJ, de Beer TA

Abstract: BACKGROUND: Up to half a billion human clinical cases of malaria are reported each year, resulting in about 2.7 million deaths, most of which occur in sub-Saharan Africa. Due to the over-and misuse of anti-malarials, widespread resistance to all the known drugs is increasing at an alarming rate. Rational methods to select new drug target proteins and lead compounds are urgently needed. The Discovery system provides data mining functionality on extensive annotations of five malaria species together with the human and mosquito hosts, enabling the selection of new targets based on multiple protein and ligand properties. METHODS: A web-based system was developed where researchers are able to mine information on malaria proteins and predicted ligands, as well as perform comparisons to the human and mosquito host characteristics. Protein features used include: domains, motifs, EC numbers, GO terms, orthologs, protein-protein interactions, protein-ligand interactions and host-pathogen interactions among others. Searching by chemical structure is also available. RESULTS: An in silico system for the selection of putative drug targets and lead compounds is presented, together with an example study on the bifunctional DHFR-TS from Plasmodium falciparum. CONCLUSION: The Discovery system allows for the identification of putative drug targets and lead compounds in Plasmodium species based on the filtering of protein and chemical properties.
Published on July 27, 2009
READ PUBLICATION →

CanGeneBase (CGB)--a database on cancer related genes.

Authors: Kumar GR, Subazini TK, Subha K, Rajadurai CP, Prabakar L

Abstract: UNLABELLED: The advent of genomic and proteomic technologies in this post-genomic era has urged the researchers to develop novel research strategies against cancer by targeting the human genes that would greatly facilitate to identify more promising treatment and to develop accurate early diagnosis for cancer. To harness the power of cancer genetic information towards better treatment we have developed a cancer gene database called CanGeneBase (CGB). It is a comprehensive data collection of cancer-related genes with the intention of helping the researchers to stay on a single platform to gain exclusive information on the genes of their interest. According to the Cancer Gene Data Curation Project, about 4,700 genes have been identified as being related to cancer. The present CanGeneBase covers about 12 different types of cancer which includes 190 unique gene entries. Each entry encompasses about 33 useful parameters to provide detailed information about specific gene. CanGeneBase is made in such a way that it can be easily accessed by either gene symbol or by the type of cancer. AVAILABILITY: The database is freely available at http://122.165.25.137/bioinfo/cancerdb/
Published on July 9, 2009
READ PUBLICATION →

Hmrbase: a database of hormones and their receptors.

Authors: Rashid M, Singla D, Sharma A, Kumar M, Raghava GP

Abstract: BACKGROUND: Hormones are signaling molecules that play vital roles in various life processes, like growth and differentiation, physiology, and reproduction. These molecules are mostly secreted by endocrine glands, and transported to target organs through the bloodstream. Deficient, or excessive, levels of hormones are associated with several diseases such as cancer, osteoporosis, diabetes etc. Thus, it is important to collect and compile information about hormones and their receptors. DESCRIPTION: This manuscript describes a database called Hmrbase which has been developed for managing information about hormones and their receptors. It is a highly curated database for which information has been collected from the literature and the public databases. The current version of Hmrbase contains comprehensive information about approximately 2000 hormones, e.g., about their function, source organism, receptors, mature sequences, structures etc. Hmrbase also contains information about approximately 3000 hormone receptors, in terms of amino acid sequences, subcellular localizations, ligands, and post-translational modifications etc. One of the major features of this database is that it provides data about approximately 4100 hormone-receptor pairs. A number of online tools have been integrated into the database, to provide the facilities like keyword search, structure-based search, mapping of a given peptide(s) on the hormone/receptor sequence, sequence similarity search. This database also provides a number of external links to other resources/databases in order to help in the retrieving of further related information. CONCLUSION: Owing to the high impact of endocrine research in the biomedical sciences, the Hmrbase could become a leading data portal for researchers. The salient features of Hmrbase are hormone-receptor pair-related information, mapping of peptide stretches on the protein sequences of hormones and receptors, Pfam domain annotations, categorical browsing options, online data submission, DrugPedia linkage etc. Hmrbase is available online for public from http://crdd.osdd.net/raghava/hmrbase/.
Published in June 2009
READ PUBLICATION →

Integrating statistical predictions and experimental verifications for enhancing protein-chemical interaction predictions in virtual screening.

Authors: Nagamine N, Shirakawa T, Minato Y, Torii K, Kobayashi H, Imoto M, Sakakibara Y

Abstract: Predictions of interactions between target proteins and potential leads are of great benefit in the drug discovery process. We present a comprehensively applicable statistical prediction method for interactions between any proteins and chemical compounds, which requires only protein sequence data and chemical structure data and utilizes the statistical learning method of support vector machines. In order to realize reasonable comprehensive predictions which can involve many false positives, we propose two approaches for reduction of false positives: (i) efficient use of multiple statistical prediction models in the framework of two-layer SVM and (ii) reasonable design of the negative data to construct statistical prediction models. In two-layer SVM, outputs produced by the first-layer SVM models, which are constructed with different negative samples and reflect different aspects of classifications, are utilized as inputs to the second-layer SVM. In order to design negative data which produce fewer false positive predictions, we iteratively construct SVM models or classification boundaries from positive and tentative negative samples and select additional negative sample candidates according to pre-determined rules. Moreover, in order to fully utilize the advantages of statistical learning methods, we propose a strategy to effectively feedback experimental results to computational predictions with consideration of biological effects of interest. We show the usefulness of our approach in predicting potential ligands binding to human androgen receptors from more than 19 million chemical compounds and verifying these predictions by in vitro binding. Moreover, we utilize this experimental validation as feedback to enhance subsequent computational predictions, and experimentally validate these predictions again. This efficient procedure of the iteration of the in silico prediction and in vitro or in vivo experimental verifications with the sufficient feedback enabled us to identify novel ligand candidates which were distant from known ligands in the chemical space.
Published in June 2009
READ PUBLICATION →

Effects of ionic strength on passive and iontophoretic transport of cationic permeant across human nail.

Authors: Smith KA, Hao J, Li SK

Abstract: PURPOSE: Transport across the human nail under hydration can be modeled as hindered transport across aqueous pore pathways. As such, nail permselectivity to charged species can be manipulated by changing the ionic strength of the system in transungual delivery to treat nail diseases. The present study investigated the effects of ionic strength upon transungual passive and iontophoretic transport. METHODS: Transungual passive and anodal iontophoretic transport experiments of tetraethylammonium ion (TEA) were conducted under symmetric conditions in which the donor and receiver had the same ionic strength in vitro. Experiments under asymmetric conditions were performed to mimic the in vivo conditions. Prior to the transport studies, TEA uptake studies were performed to assess the partitioning of TEA into the nail. RESULTS: Permselectivity towards TEA was inversely related to ionic strength in both passive and iontophoretic transport. The permeability and transference number of TEA were higher at lower ionic strengths under the symmetric conditions due to increased partitioning of TEA into the nail. Transference numbers were smaller under the asymmetric conditions compared with their symmetric counterparts. CONCLUSIONS: The results demonstrate significant ionic strength effects upon the partitioning and transport of a cationic permeant in transungual transport, which may be instrumental in the development of transungual delivery systems.
Published on June 16, 2009
READ PUBLICATION →

A chemogenomics view on protein-ligand spaces.

Authors: Strombergsson H, Kleywegt GJ

Abstract: BACKGROUND: Chemogenomics is an emerging inter-disciplinary approach to drug discovery that combines traditional ligand-based approaches with biological information on drug targets and lies at the interface of chemistry, biology and informatics. The ultimate goal in chemogenomics is to understand molecular recognition between all possible ligands and all possible drug targets. Protein and ligand space have previously been studied as separate entities, but chemogenomics studies deal with large datasets that cover parts of the joint protein-ligand space. Since drug discovery has traditionally focused on ligand optimization, the chemical space has been studied extensively. The protein space has been studied to some extent, typically for the purpose of classification of proteins into functional and structural classes. Since chemogenomics deals not only with ligands but also with the macromolecules the ligands interact with, it is of interest to find means to explore, compare and visualize protein-ligand subspaces. RESULTS: Two chemogenomics protein-ligand interaction datasets were prepared for this study. The first dataset covers the known structural protein-ligand space, and includes all non-redundant protein-ligand interactions found in the worldwide Protein Data Bank (PDB). The second dataset contains all approved drugs and drug targets stored in the DrugBank database, and represents the approved drug-drug target space. To capture biological and physicochemical features of the chemogenomics datasets, sequence-based descriptors were computed for the proteins, and 0, 1 and 2 dimensional descriptors for the ligands. Principal component analysis (PCA) was used to analyze the multidimensional data and to create global models of protein-ligand space. The nearest neighbour method, computed using the principal components, was used to obtain a measure of overlap between the datasets. CONCLUSION: In this study, we present an approach to visualize protein-ligand spaces from a chemogenomics perspective, where both ligand and protein features are taken into account. The method can be applied to any protein-ligand interaction dataset. Here, the approach is applied to analyze the structural protein-ligand space and the protein-ligand space of all approved drugs and their targets. We show that this approach can be used to visualize and compare chemogenomics datasets, and possibly to identify cross-interaction complexes in protein-ligand space.
Published in May - June 2009
READ PUBLICATION →

Potential aggregation prone regions in biotherapeutics: A survey of commercial monoclonal antibodies.

Authors: Wang X, Das TK, Singh SK, Kumar S

Abstract: Aggregation of a biotherapeutic is of significant concern and judicious process and formulation development is required to minimize aggregate levels in the final product. Aggregation of a protein in solution is driven by intrinsic and extrinsic factors. In this work we have focused on aggregation as an intrinsic property of the molecule. We have studied the sequences and Fab structures of commercial and non-commercial antibody sequences for their vulnerability towards aggregation by using sequence based computational tools to identify potential aggregation-prone motifs or regions. The mAbs in our dataset contain 2 to 8 aggregation-prone motifs per heavy and light chain pair. Some of these motifs are located in variable domains, primarily in CDRs. Most aggregation-prone motifs are rich in beta branched aliphatic and aromatic residues. Hydroxyl-containing Ser/Thr residues are also found in several aggregation-prone motifs while charged residues are rare. The motifs found in light chain CDR3 are glutamine (Q)/asparagine (N) rich. These motifs are similar to the reported aggregation promoting regions found in prion and amyloidogenic proteins that are also rich in Q/N, aliphatic and aromatic residues. The implication is that one possible mechanism for aggregation of mAbs may be through formation of cross-beta structures and fibrils. Mapping on the available Fab-receptor/antigen complex structures reveals that these motifs in CDRs might also contribute significantly towards receptor/antigen binding. Our analysis identifies the opportunity and tools for simultaneous optimization of the therapeutic protein sequence for potency and specificity while reducing vulnerability towards aggregation.