Publications Search
Explore how scientists all over the world use DrugBank in their research.
Published in January 2008
READ PUBLICATION →

ChemBank: a small-molecule screening and cheminformatics resource database.

Authors: Seiler KP, George GA, Happ MP, Bodycombe NE, Carrinski HA, Norton S, Brudz S, Sullivan JP, Muhlich J, Serrano M, Ferraiolo P, Tolliday NJ, Schreiber SL, Clemons PA

Abstract: ChemBank (http://chembank.broad.harvard.edu/) is a public, web-based informatics environment developed through a collaboration between the Chemical Biology Program and Platform at the Broad Institute of Harvard and MIT. This knowledge environment includes freely available data derived from small molecules and small-molecule screens and resources for studying these data. ChemBank is unique among small-molecule databases in its dedication to the storage of raw screening data, its rigorous definition of screening experiments in terms of statistical hypothesis testing, and its metadata-based organization of screening experiments into projects involving collections of related assays. ChemBank stores an increasingly varied set of measurements derived from cells and other biological assay systems treated with small molecules. Analysis tools are available and are continuously being developed that allow the relationships between small molecules, cell measurements, and cell states to be studied. Currently, ChemBank stores information on hundreds of thousands of small molecules and hundreds of biomedically relevant assays that have been performed at the Broad Institute by collaborators from the worldwide research community. The goal of ChemBank is to provide life scientists unfettered access to biomedically relevant data and tools heretofore available primarily in the private sector.
Published on January 23, 2008
READ PUBLICATION →

Gene characterization index: assessing the depth of gene annotation.

Authors: Kemmer D, Podowski RM, Yusuf D, Brumm J, Cheung W, Wahlestedt C, Lenhard B, Wasserman WW

Abstract: BACKGROUND: We introduce the Gene Characterization Index, a bioinformatics method for scoring the extent to which a protein-encoding gene is functionally described. Inherently a reflection of human perception, the Gene Characterization Index is applied for assessing the characterization status of individual genes, thus serving the advancement of both genome annotation and applied genomics research by rapid and unbiased identification of groups of uncharacterized genes for diverse applications such as directed functional studies and delineation of novel drug targets. METHODOLOGY/PRINCIPAL FINDINGS: The scoring procedure is based on a global survey of researchers, who assigned characterization scores from 1 (poor) to 10 (extensive) for a sample of genes based on major online resources. By evaluating the survey as training data, we developed a bioinformatics procedure to assign gene characterization scores to all genes in the human genome. We analyzed snapshots of functional genome annotation over a period of 6 years to assess temporal changes reflected by the increase of the average Gene Characterization Index. Applying the Gene Characterization Index to genes within pharmaceutically relevant classes, we confirmed known drug targets as high-scoring genes and revealed potentially interesting novel targets with low characterization indexes. Removing known drug targets and genes linked to sequence-related patent filings from the entirety of indexed genes, we identified sets of low-scoring genes particularly suited for further experimental investigation. CONCLUSIONS/SIGNIFICANCE: The Gene Characterization Index is intended to serve as a tool to the scientific community and granting agencies for focusing resources and efforts on unexplored areas of the genome. The Gene Characterization Index is available from http://cisreg.ca/gci/.
Published in December 2007
READ PUBLICATION →

Computational models to assign biopharmaceutics drug disposition classification from molecular structure.

Authors: Khandelwal A, Bahadduri PM, Chang C, Polli JE, Swaan PW, Ekins S

Abstract: PURPOSE: We applied in silico methods to automatically classify drugs according to the Biopharmaceutics Drug Disposition Classification System (BDDCS). MATERIALS AND METHODS: Models were developed using machine learning methods including recursive partitioning (RP), random forest (RF) and support vector machine (SVM) algorithms with ChemDraw, clogP, polar surface area, VolSurf and MolConnZ descriptors. The dataset consisted of 165 training and 56 test set molecules. RESULTS: RF model 3, RP model 1, and SVM model 1 can correctly predict 73.1, 63.6 and 78.6% test compounds in classes 1, 2 and 3, respectively. Both RP and SVM models can be used for class 4 prediction. The inclusion of consensus analysis resulted in improved test set predictions for class 2 and 4 drugs. CONCLUSIONS: The models can be used to predict BDDCS class for new compounds from molecular structure using readily available molecular descriptors and software, representing an area where in silico approaches could aid the pharmaceutical industry in speeding drugs to the patient and reducing costs. This could have significant applications in drug discovery to identify molecules that may have future developability issues.
Published on December 10, 2007
READ PUBLICATION →

Quantitative analysis on the characteristics of targets with FDA approved drugs.

Authors: Sakharkar MK, Li P, Zhong Z, Sakharkar KR

Abstract: Accumulated knowledge of genomic information, systems biology, and disease mechanisms provide an unprecedented opportunity to elucidate the genetic basis of diseases, and to discover new and novel therapeutic targets from the wealth of genomic data. With hundreds to a few thousand potential targets available in the human genome alone, target selection and validation has become a critical component of drug discovery process. The explorations on quantitative characteristics of the currently explored targets (those without any marketed drug) and successful targets (targeted by at least one marketed drug) could help discern simple rules for selecting a putative successful target. Here we use integrative in silico (computational) approaches to quantitatively analyze the characteristics of 133 targets with FDA approved drugs and 3120 human disease genes (therapeutic targets) not targeted by FDA approved drugs. This is the first attempt to comparatively analyze targets with FDA approved drugs and targets with no FDA approved drug or no drugs available for them. Our results show that proteins with 5 or fewer number of homologs outside their own family, proteins with single-exon gene architecture and proteins interacting with more than 3 partners are more likely to be targetable. These quantitative characteristics could serve as criteria to search for promising targetable disease genes.
Published in November 2007
READ PUBLICATION →

In silico elucidation of the molecular mechanism defining the adverse effect of selective estrogen receptor modulators.

Authors: Xie L, Wang J, Bourne PE

Abstract: Early identification of adverse effect of preclinical and commercial drugs is crucial in developing highly efficient therapeutics, since unexpected adverse drug effects account for one-third of all drug failures in drug development. To correlate protein-drug interactions at the molecule level with their clinical outcomes at the organism level, we have developed an integrated approach to studying protein-ligand interactions on a structural proteome-wide scale by combining protein functional site similarity search, small molecule screening, and protein-ligand binding affinity profile analysis. By applying this methodology, we have elucidated a possible molecular mechanism for the previously observed, but molecularly uncharacterized, side effect of selective estrogen receptor modulators (SERMs). The side effect involves the inhibition of the Sacroplasmic Reticulum Ca2+ ion channel ATPase protein (SERCA) transmembrane domain. The prediction provides molecular insight into reducing the adverse effect of SERMs and is supported by clinical and in vitro observations. The strategy used in this case study is being applied to discover off-targets for other commercially available pharmaceuticals. The process can be included in a drug discovery pipeline in an effort to optimize drug leads and reduce unwanted side effects.
Published in September 2007
READ PUBLICATION →

In silico pharmacology for drug discovery: applications to targets and beyond.

Authors: Ekins S, Mestres J, Testa B

Abstract: Computational (in silico) methods have been developed and widely applied to pharmacology hypothesis development and testing. These in silico methods include databases, quantitative structure-activity relationships, similarity searching, pharmacophores, homology models and other molecular modeling, machine learning, data mining, network analysis tools and data analysis tools that use a computer. Such methods have seen frequent use in the discovery and optimization of novel molecules with affinity to a target, the clarification of absorption, distribution, metabolism, excretion and toxicity properties as well as physicochemical characterization. The first part of this review discussed the methods that have been used for virtual ligand and target-based screening and profiling to predict biological activity. The aim of this second part of the review is to illustrate some of the varied applications of in silico methods for pharmacology in terms of the targets addressed. We will also discuss some of the advantages and disadvantages of in silico methods with respect to in vitro and in vivo methods for pharmacology research. Our conclusion is that the in silico pharmacology paradigm is ongoing and presents a rich array of opportunities that will assist in expediting the discovery of new targets, and ultimately lead to compounds with predicted biological activity for these novel targets.
Published on September 20, 2007
READ PUBLICATION →

Prediction of potential drug targets based on simple sequence properties.

Authors: Li Q, Lai L

Abstract: BACKGROUND: During the past decades, research and development in drug discovery have attracted much attention and efforts. However, only 324 drug targets are known for clinical drugs up to now. Identifying potential drug targets is the first step in the process of modern drug discovery for developing novel therapeutic agents. Therefore, the identification and validation of new and effective drug targets are of great value for drug discovery in both academia and pharmaceutical industry. If a protein can be predicted in advance for its potential application as a drug target, the drug discovery process targeting this protein will be greatly speeded up. In the current study, based on the properties of known drug targets, we have developed a sequence-based drug target prediction method for fast identification of novel drug targets. RESULTS: Based on simple physicochemical properties extracted from protein sequences of known drug targets, several support vector machine models have been constructed in this study. The best model can distinguish currently known drug targets from non drug targets at an accuracy of 84%. Using this model, potential protein drug targets of human origin from Swiss-Prot were predicted, some of which have already attracted much attention as potential drug targets in pharmaceutical research. CONCLUSION: We have developed a drug target prediction method based solely on protein sequence information without the knowledge of family/domain annotation, or the protein 3D structure. This method can be applied in novel drug target identification and validation, as well as genome scale drug target predictions.
Published on August 20, 2007
READ PUBLICATION →

An integrative in silico approach for discovering candidates for drug-targetable protein-protein interactions in interactome data.

Authors: Sugaya N, Ikeda K, Tashiro T, Takeda S, Otomo J, Ishida Y, Shiratori A, Toyoda A, Noguchi H, Takeda T, Kuhara S, Sakaki Y, Iwayanagi T

Abstract: BACKGROUND: Protein-protein interactions (PPIs) are challenging but attractive targets for small chemical drugs. Whole PPIs, called the 'interactome', have been emerged in several organisms, including human, based on the recent development of high-throughput screening (HTS) technologies. Individual PPIs have been targeted by small drug-like chemicals (SDCs), however, interactome data have not been fully utilized for exploring drug targets due to the lack of comprehensive methodology for utilizing these data. Here we propose an integrative in silico approach for discovering candidates for drug-targetable PPIs in interactome data. RESULTS: Our novel in silico screening system comprises three independent assessment procedures: i) detection of protein domains responsible for PPIs, ii) finding SDC-binding pockets on protein surfaces, and iii) evaluating similarities in the assignment of Gene Ontology (GO) terms between specific partner proteins. We discovered six candidates for drug-targetable PPIs by applying our in silico approach to original human PPI data composed of 770 binary interactions produced by our HTS yeast two-hybrid (HTS-Y2H) assays. Among them, we further examined two candidates, RXRA/NRIP1 and CDK2/CDKN1A, with respect to their biological roles, PPI network around each candidate, and tertiary structures of the interacting domains. CONCLUSION: An integrative in silico approach for discovering candidates for drug-targetable PPIs was applied to original human PPIs data. The system excludes false positive interactions and selects reliable PPIs as drug targets. Its effectiveness was demonstrated by the discovery of the six promising candidate target PPIs. Inhibition or stabilization of the two interactions may have potential therapeutic effects against human diseases.
Published on August 1, 2007
READ PUBLICATION →

PepBank--a database of peptides based on sequence text mining and public peptide data sources.

Authors: Shtatland T, Guettler D, Kossodo M, Pivovarov M, Weissleder R

Abstract: BACKGROUND: Peptides are important molecules with diverse biological functions and biomedical uses. To date, there does not exist a single, searchable archive for peptide sequences or associated biological data. Rather, peptide sequences still have to be mined from abstracts and full-length articles, and/or obtained from the fragmented public sources. DESCRIPTION: We have constructed a new database (PepBank), which at the time of writing contains a total of 19,792 individual peptide entries. The database has a web-based user interface with a simple, Google-like search function, advanced text search, and BLAST and Smith-Waterman search capabilities. The major source of peptide sequence data comes from text mining of MEDLINE abstracts. Another component of the database is the peptide sequence data from public sources (ASPD and UniProt). An additional, smaller part of the database is manually curated from sets of full text articles and text mining results. We show the utility of the database in different examples of affinity ligand discovery. CONCLUSION: We have created and maintain a database of peptide sequences. The database has biological and medical applications, for example, to predict the binding partners of biologically interesting peptides, to develop peptide based therapeutic or diagnostic agents, or to predict molecular targets or binding specificities of peptides resulting from phage display selection. The database is freely available on http://pepbank.mgh.harvard.edu/, and the text mining source code (Peptide::Pubmed) is freely available above as well as on CPAN (http://www.cpan.org/).
Published in July 2007
READ PUBLICATION →

Development and validation of a physiology-based model for the prediction of oral absorption in monkeys.

Authors: Willmann S, Edginton AN, Dressman JB

Abstract: PURPOSE: The development and validation of a physiology-based absorption model for orally administered drugs in monkeys is described. MATERIALS AND METHODS: Physiological parameters affecting intestinal transit and absorption of an orally administered drug in monkeys have been collected from the literature and implemented in a physiological model for passive absorption previously developed for rats and humans. Predicted fractions of dose absorbed have been compared to experimentally observed values for a set of N = 37 chemically diverse drugs. A sensitivity analysis was performed to assess the influence of various physiological model parameters on the predicted fraction dose absorbed. RESULTS: A Pearson's correlation coefficient of 0.94 (95% confidence interval: [0.88, 0.97]; p < 0.0001) between the predicted and observed fraction dose absorbed in monkeys was obtained for compounds undergoing non-solubility limited passive absorption (N = 29). The sensitivity analysis revealed that the predictions of fractions dose absorbed in monkeys are very sensitive with respect to inter-individual variations of the small intestinal transit time. CONCLUSIONS: The model is well suited to predict the fraction dose absorbed of passively absorbed compounds after oral administration and to assess the influence of inter-individual physiological variability on oral absorption in monkeys.