Publications Search
Explore how scientists all over the world use DrugBank in their research.
Published on May 9, 2011
READ PUBLICATION →

PubChem3D: Similar conformers.

Authors: Bolton EE, Kim S, Bryant SH

Abstract: BACKGROUND: PubChem is a free and open public resource for the biological activities of small molecules. With many tens of millions of both chemical structures and biological test results, PubChem is a sizeable system with an uneven degree of available information. Some chemical structures in PubChem include a great deal of biological annotation, while others have little to none. To help users, PubChem pre-computes "neighboring" relationships to relate similar chemical structures, which may have similar biological function. In this work, we introduce a "Similar Conformers" neighboring relationship to identify compounds with similar 3-D shape and similar 3-D orientation of functional groups typically used to define pharmacophore features. RESULTS: The first two diverse 3-D conformers of 26.1 million PubChem Compound records were compared to each other, using a shape Tanimoto (ST) of 0.8 or greater and a color Tanimoto (CT) of 0.5 or greater, yielding 8.16 billion conformer neighbor pairs and 6.62 billion compound neighbor pairs, with an average of 253 "Similar Conformers" compound neighbors per compound. Comparing the 3-D neighboring relationship to the corresponding 2-D neighboring relationship ("Similar Compounds") for molecules such as caffeine, aspirin, and morphine, one finds unique sets of related chemical structures, providing additional significant biological annotation. The PubChem 3-D neighboring relationship is also shown to be able to group a set of non-steroidal anti-inflammatory drugs (NSAIDs), despite limited PubChem 2-D similarity.In a study of 4,218 chemical structures of biomedical interest, consisting of many known drugs, using more diverse conformers per compound results in more 3-D compound neighbors per compound; however, the overlap of the compound neighbor lists per conformer also increasingly resemble each other, being 38% identical at three conformers and 68% at ten conformers. Perhaps surprising is that the average count of conformer neighbors per conformer increases rather slowly as a function of diverse conformers considered, with only a 70% increase for a ten times growth in conformers per compound (a 68-fold increase in the conformer pairs considered).Neighboring 3-D conformers on the scale performed, if implemented naively, is an intractable problem using a modest sized compute cluster. Methodology developed in this work relies on a series of filters to prevent performing 3-D superposition optimization, when it can be determined that two conformers cannot possibly be a neighbor. Most filters are based on Tanimoto equation volume constraints, avoiding incompatible conformers; however, others consider preliminary superposition between conformers using reference shapes. CONCLUSION: The "Similar Conformers" 3-D neighboring relationship locates similar small molecules of biological interest that may go unnoticed when using traditional 2-D chemical structure graph-based methods, making it complementary to such methodologies. The computational cost of 3-D similarity methodology on a wide scale, such as PubChem contents, is a considerable issue to overcome. Using a series of efficient filters, an effective throughput rate of more than 150,000 conformers per second per processor core was achieved, more than two orders of magnitude faster than without filtering.
Published on May 3, 2011
READ PUBLICATION →

Signalogs: orthology-based identification of novel signaling pathway components in three metazoans.

Authors: Korcsmaros T, Szalay MS, Rovo P, Palotai R, Fazekas D, Lenti K, Farkas IJ, Csermely P, Vellai T

Abstract: BACKGROUND: Uncovering novel components of signal transduction pathways and their interactions within species is a central task in current biological research. Orthology alignment and functional genomics approaches allow the effective identification of signaling proteins by cross-species data integration. Recently, functional annotation of orthologs was transferred across organisms to predict novel roles for proteins. Despite the wide use of these methods, annotation of complete signaling pathways has not yet been transferred systematically between species. PRINCIPAL FINDINGS: Here we introduce the concept of 'signalog' to describe potential novel signaling function of a protein on the basis of the known signaling role(s) of its ortholog(s). To identify signalogs on genomic scale, we systematically transferred signaling pathway annotations among three animal species, the nematode Caenorhabditis elegans, the fruit fly Drosophila melanogaster, and humans. Using orthology data from InParanoid and signaling pathway information from the SignaLink database, we predict 88 worm, 92 fly, and 73 human novel signaling components. Furthermore, we developed an on-line tool and an interactive orthology network viewer to allow users to predict and visualize components of orthologous pathways. We verified the novelty of the predicted signalogs by literature search and comparison to known pathway annotations. In C. elegans, 6 out of the predicted novel Notch pathway members were validated experimentally. Our approach predicts signaling roles for 19 human orthodisease proteins and 5 known drug targets, and suggests 14 novel drug target candidates. CONCLUSIONS: Orthology-based pathway membership prediction between species enables the identification of novel signaling pathway components that we referred to as signalogs. Signalogs can be used to build a comprehensive signaling network in a given species. Such networks may increase the biomedical utilization of C. elegans and D. melanogaster. In humans, signalogs may identify novel drug targets and new signaling mechanisms for approved drugs.
Published on April 27, 2011
READ PUBLICATION →

The NCGC pharmaceutical collection: a comprehensive resource of clinically approved drugs enabling repurposing and chemical genomics.

Authors: Huang R, Southall N, Wang Y, Yasgar A, Shinn P, Jadhav A, Nguyen DT, Austin CP

Abstract: Small-molecule compounds approved for use as drugs may be "repurposed" for new indications and studied to determine the mechanisms of their beneficial and adverse effects. A comprehensive collection of all small-molecule drugs approved for human use would be invaluable for systematic repurposing across human diseases, particularly for rare and neglected diseases, for which the cost and time required for development of a new chemical entity are often prohibitive. Previous efforts to build such a comprehensive collection have been limited by the complexities, redundancies, and semantic inconsistencies of drug naming within and among regulatory agencies worldwide; a lack of clear conceptualization of what constitutes a drug; and a lack of access to physical samples. We report here the creation of a definitive, complete, and nonredundant list of all approved molecular entities as a freely available electronic resource and a physical collection of small molecules amenable to high-throughput screening.
Published on April 26, 2011
READ PUBLICATION →

Diversity-oriented synthesis of macrocyclic peptidomimetics.

Authors: Isidro-Llobet A, Murillo T, Bello P, Cilibrizzi A, Hodgkinson JT, Galloway WR, Bender A, Welch M, Spring DR

Abstract: Structurally diverse libraries of novel small molecules represent important sources of biologically active agents. In this paper we report the development of a diversity-oriented synthesis strategy for the generation of diverse small molecules based around a common macrocyclic peptidomimetic framework, containing structural motifs present in many naturally occurring bioactive compounds. Macrocyclic peptidomimetics are largely underrepresented in current small-molecule screening collections owing primarily to synthetic intractability; thus novel molecules based around these structures represent targets of significant interest, both from a biological and a synthetic perspective. In a proof-of-concept study, the synthesis of a library of 14 such compounds was achieved. Analysis of chemical space coverage confirmed that the compound structures indeed occupy underrepresented areas of chemistry in screening collections. Crucial to the success of this approach was the development of novel methodologies for the macrocyclic ring closure of chiral alpha-azido acids and for the synthesis of diketopiperazines using solid-supported N methylmorpholine. Owing to their robust and flexible natures, it is envisaged that both new methodologies will prove to be valuable in a wider synthetic context.
Published on April 25, 2011
READ PUBLICATION →

Residue preference mapping of ligand fragments in the Protein Data Bank.

Authors: Wang L, Xie Z, Wipf P, Xie XQ

Abstract: The interaction between small molecules and proteins is one of the major concerns for structure-based drug design because the principles of protein-ligand interactions and molecular recognition are not thoroughly understood. Fortunately, the analysis of protein-ligand complexes in the Protein Data Bank (PDB) enables unprecedented possibilities for new insights. Herein, we applied molecule-fragmentation algorithms to split the ligands extracted from PDB crystal structures into small fragments. Subsequently, we have developed a ligand fragment and residue preference mapping (LigFrag-RPM) algorithm to map the profiles of the interactions between these fragments and the 20 proteinogenic amino acid residues. A total of 4032 fragments were generated from 71 798 PDB ligands by a ring cleavage (RC) algorithm. Among these ligand fragments, 315 unique fragments were characterized with the corresponding fragment-residue interaction profiles by counting residues close to these fragments. The interaction profiles revealed that these fragments have specific preferences for certain types of residues. The applications of these interaction profiles were also explored and evaluated in case studies, showing great potential for the study of protein-ligand interactions and drug design. Our studies demonstrated that the fragment-residue interaction profiles generated from the PDB ligand fragments can be used to detect whether these fragments are in their favorable or unfavorable environments. The algorithm for a ligand fragment and residue preference mapping (LigFrag-RPM) developed here also has the potential to guide lead chemistry modifications as well as binding residues predictions.
Published on April 25, 2011
READ PUBLICATION →

Data-driven high-throughput prediction of the 3-D structure of small molecules: review and progress.

Authors: Andronico A, Randall A, Benz RW, Baldi P

Abstract: Accurate prediction of the 3-D structure of small molecules is essential in order to understand their physical, chemical, and biological properties, including how they interact with other molecules. Here, we survey the field of high-throughput methods for 3-D structure prediction and set up new target specifications for the next generation of methods. We then introduce COSMOS, a novel data-driven prediction method that utilizes libraries of fragment and torsion angle parameters. We illustrate COSMOS using parameters extracted from the Cambridge Structural Database (CSD) by analyzing their distribution and then evaluating the system's performance in terms of speed, coverage, and accuracy. Results show that COSMOS represents a significant improvement when compared to state-of-the-art prediction methods, particularly in terms of coverage of complex molecular structures, including metal-organics. COSMOS can predict structures for 96.4% of the molecules in the CSD (99.6% organic, 94.6% metal-organic), whereas the widely used commercial method CORINA predicts structures for 68.5% (98.5% organic, 51.6% metal-organic). On the common subset of molecules predicted by both methods, COSMOS makes predictions with an average speed per molecule of 0.15 s (0.10 s organic, 0.21 s metal-organic) and an average rmsd of 1.57 A (1.26 A organic, 1.90 A metal-organic), and CORINA makes predictions with an average speed per molecule of 0.13s (0.18s organic, 0.08s metal-organic) and an average rmsd of 1.60 A (1.13 A organic, 2.11 A metal-organic). COSMOS is available through the ChemDB chemoinformatics Web portal at http://cdb.ics.uci.edu/ .
Published in March 2011
READ PUBLICATION →

A new approach to assess and predict the functional roles of proteins across all known structures.

Authors: Julfayev ES, McLaughlin RJ, Tao YP, McLaughlin WA

Abstract: The three dimensional atomic structures of proteins provide information regarding their function; and codified relationships between structure and function enable the assessment of function from structure. In the current study, a new data mining tool was implemented that checks current gene ontology (GO) annotations and predicts new ones across all the protein structures available in the Protein Data Bank (PDB). The tool overcomes some of the challenges of utilizing large amounts of protein annotation and measurement information to form correspondences between protein structure and function. Protein attributes were extracted from the Structural Biology Knowledgebase and open source biological databases. Based on the presence or absence of a given set of attributes, a given protein's functional annotations were inferred. The results show that attributes derived from the three dimensional structures of proteins enhanced predictions over that using attributes only derived from primary amino acid sequence. Some predictions reflected known but not completely documented GO annotations. For example, predictions for the GO term for copper ion binding reflected used information a copper ion was known to interact with the protein based on information in a ligand interaction database. Other predictions were novel and require further experimental validation. These include predictions for proteins labeled as unknown function in the PDB. Two examples are a role in the regulation of transcription for the protein AF1396 from Archaeoglobus fulgidus and a role in RNA metabolism for the protein psuG from Thermotoga maritima.
Published on March 29, 2011
READ PUBLICATION →

Building the process-drug-side effect network to discover the relationship between biological processes and side effects.

Authors: Lee S, Lee KH, Song M, Lee D

Abstract: BACKGROUND: Side effects are unwanted responses to drug treatment and are important resources for human phenotype information. The recent development of a database on side effects, the side effect resource (SIDER), is a first step in documenting the relationship between drugs and their side effects. It is, however, insufficient to simply find the association of drugs with biological processes; that relationship is crucial because drugs that influence biological processes can have an impact on phenotype. Therefore, knowing which processes respond to drugs that influence the phenotype will enable more effective and systematic study of the effect of drugs on phenotype. To the best of our knowledge, the relationship between biological processes and side effects of drugs has not yet been systematically researched. METHODS: We propose 3 steps for systematically searching relationships between drugs and biological processes: enrichment scores (ES) calculations, t-score calculation, and threshold-based filtering. Subsequently, the side effect-related biological processes are found by merging the drug-biological process network and the drug-side effect network. Evaluation is conducted in 2 ways: first, by discerning the number of biological processes discovered by our method that co-occur with Gene Ontology (GO) terms in relation to effects extracted from PubMed records using a text-mining technique and second, determining whether there is improvement in performance by limiting response processes by drugs sharing the same side effect to frequent ones alone. RESULTS: The multi-level network (the process-drug-side effect network) was built by merging the drug-biological process network and the drug-side effect network. We generated a network of 74 drugs-168 side effects-2209 biological process relation resources. The preliminary results showed that the process-drug-side effect network was able to find meaningful relationships between biological processes and side effects in an efficient manner. CONCLUSIONS: We propose a novel process-drug-side effect network for discovering the relationship between biological processes and side effects. By exploring the relationship between drugs and phenotypes through a multi-level network, the mechanisms underlying the effect of specific drugs on the human body may be understood.
Published on March 29, 2011
READ PUBLICATION →

A linguistic rule-based approach to extract drug-drug interactions from pharmacological documents.

Authors: Segura-Bedmar I, Martinez P, de Pablo-Sanchez C

Abstract: BACKGROUND: A drug-drug interaction (DDI) occurs when one drug influences the level or activity of another drug. The increasing volume of the scientific literature overwhelms health care professionals trying to be kept up-to-date with all published studies on DDI. METHODS: This paper describes a hybrid linguistic approach to DDI extraction that combines shallow parsing and syntactic simplification with pattern matching. Appositions and coordinate structures are interpreted based on shallow syntactic parsing provided by the UMLS MetaMap tool (MMTx). Subsequently, complex and compound sentences are broken down into clauses from which simple sentences are generated by a set of simplification rules. A pharmacist defined a set of domain-specific lexical patterns to capture the most common expressions of DDI in texts. These lexical patterns are matched with the generated sentences in order to extract DDIs. RESULTS: We have performed different experiments to analyze the performance of the different processes. The lexical patterns achieve a reasonable precision (67.30%), but very low recall (14.07%). The inclusion of appositions and coordinate structures helps to improve the recall (25.70%), however, precision is lower (48.69%). The detection of clauses does not improve the performance. CONCLUSIONS: Information Extraction (IE) techniques can provide an interesting way of reducing the time spent by health care professionals on reviewing the literature. Nevertheless, no approach has been carried out to extract DDI from texts. To the best of our knowledge, this work proposes the first integral solution for the automatic extraction of DDI from biomedical texts.
Published on March 22, 2011
READ PUBLICATION →

A network-based multi-target computational estimation scheme for anticoagulant activities of compounds.

Authors: Li Q, Li X, Li C, Chen L, Song J, Tang Y, Xu X

Abstract: BACKGROUND: Traditional virtual screening method pays more attention on predicted binding affinity between drug molecule and target related to a certain disease instead of phenotypic data of drug molecule against disease system, as is often less effective on discovery of the drug which is used to treat many types of complex diseases. Virtual screening against a complex disease by general network estimation has become feasible with the development of network biology and system biology. More effective methods of computational estimation for the whole efficacy of a compound in a complex disease system are needed, given the distinct weightiness of the different target in a biological process and the standpoint that partial inhibition of several targets can be more efficient than the complete inhibition of a single target. METHODOLOGY: We developed a novel approach by integrating the affinity predictions from multi-target docking studies with biological network efficiency analysis to estimate the anticoagulant activities of compounds. From results of network efficiency calculation for human clotting cascade, factor Xa and thrombin were identified as the two most fragile enzymes, while the catalytic reaction mediated by complex IXa:VIIIa and the formation of the complex VIIIa:IXa were recognized as the two most fragile biological matter in the human clotting cascade system. Furthermore, the method which combined network efficiency with molecular docking scores was applied to estimate the anticoagulant activities of a serial of argatroban intermediates and eight natural products respectively. The better correlation (r = 0.671) between the experimental data and the decrease of the network deficiency suggests that the approach could be a promising computational systems biology tool to aid identification of anticoagulant activities of compounds in drug discovery. CONCLUSIONS: This article proposes a network-based multi-target computational estimation method for anticoagulant activities of compounds by combining network efficiency analysis with scoring function from molecular docking.