Publications Search
Explore how scientists all over the world use DrugBank in their research.
Published in 2016
READ PUBLICATION →

MetMaxStruct: A Tversky-Similarity-Based Strategy for Analysing the (Sub)Structural Similarities of Drugs and Endogenous Metabolites.

Authors: O'Hagan S, Kell DB

Abstract: BACKGROUND: Previous studies compared the molecular similarity of marketed drugs and endogenous human metabolites (endogenites), using a series of fingerprint-type encodings, variously ranked and clustered using the Tanimoto (Jaccard) similarity coefficient (TS). Because this gives equal weight to all parts of the encoding (thence to different substructures in the molecule) it may not be optimal, since in many cases not all parts of the molecule will bind to their macromolecular targets. Unsupervised methods cannot alone uncover this. We here explore the kinds of differences that may be observed when the TS is replaced-in a manner more equivalent to semi-supervised learning-by variants of the asymmetric Tversky (TV) similarity, that includes alpha and beta parameters. RESULTS: Dramatic differences are observed in (i) the drug-endogenite similarity heatmaps, (ii) the cumulative "greatest similarity" curves, and (iii) the fraction of drugs with a Tversky similarity to a metabolite exceeding a given value when the Tversky alpha and beta parameters are varied from their Tanimoto values. The same is true when the sum of the alpha and beta parameters is varied. A clear trend toward increased endogenite-likeness of marketed drugs is observed when alpha or beta adopt values nearer the extremes of their range, and when their sum is smaller. The kinds of molecules exhibiting the greatest similarity to two interrogating drug molecules (chlorpromazine and clozapine) also vary in both nature and the values of their similarity as alpha and beta are varied. The same is true for the converse, when drugs are interrogated with an endogenite. The fraction of drugs with a Tversky similarity to a molecule in a library exceeding a given value depends on the contents of that library, and alpha and beta may be "tuned" accordingly, in a semi-supervised manner. At some values of alpha and beta drug discovery library candidates or natural products can "look" much more like (i.e., have a numerical similarity much closer to) drugs than do even endogenites. CONCLUSIONS: Overall, the Tversky similarity metrics provide a more useful range of examples of molecular similarity than does the simpler Tanimoto similarity, and help to draw attention to molecular similarities that would not be recognized if Tanimoto alone were used. Hence, the Tversky similarity metrics are likely to be of significant value in many general problems in cheminformatics.
Published in 2016
READ PUBLICATION →

MMpI: A WideRange of Available Compounds of Matrix Metalloproteinase Inhibitors.

Authors: Muvva C, Patra S, Venkatesan S

Abstract: Matrix metalloproteinases (MMPs) are a family of zinc-dependent proteinases involved in the regulation of the extracellular signaling and structural matrix environment of cells and tissues. MMPs are considered as promising targets for the treatment of many diseases. Therefore, creation of database on the inhibitors of MMP would definitely accelerate the research activities in this area due to its implication in above-mentioned diseases and associated limitations in the first and second generation inhibitors. In this communication, we report the development of a new MMpI database which provides resourceful information for all researchers working in this field. It is a web-accessible, unique resource that contains detailed information on the inhibitors of MMP including small molecules, peptides and MMP Drug Leads. The database contains entries of ~3000 inhibitors including ~72 MMP Drug Leads and ~73 peptide based inhibitors. This database provides the detailed molecular and structural details which are necessary for the drug discovery and development. The MMpI database contains physical properties, 2D and 3D structures (mol2 and pdb format files) of inhibitors of MMP. Other data fields are hyperlinked to PubChem, ChEMBL, BindingDB, DrugBank, PDB, MEROPS and PubMed. The database has extensive searching facility with MMpI ID, IUPAC name, chemical structure and with the title of research article. The MMP inhibitors provided in MMpI database are optimized using Python-based Hierarchical Environment for Integrated Xtallography (Phenix) software. MMpI Database is unique and it is the only public database that contains and provides the complete information on the inhibitors of MMP. Database URL: http://clri.res.in/subramanian/databases/mmpi/index.php.
Published in 2016
READ PUBLICATION →

Chemical named entity recognition in patents by domain knowledge and unsupervised feature learning.

Authors: Zhang Y, Xu J, Chen H, Wang J, Wu Y, Prakasam M, Xu H

Abstract: Medicinal chemistry patents contain rich information about chemical compounds. Although much effort has been devoted to extracting chemical entities from scientific literature, limited numbers of patent mining systems are publically available, probably due to the lack of large manually annotated corpora. To accelerate the development of information extraction systems for medicinal chemistry patents, the 2015 BioCreative V challenge organized a track on Chemical and Drug Named Entity Recognition from patent text (CHEMDNER patents). This track included three individual subtasks: (i) Chemical Entity Mention Recognition in Patents (CEMP), (ii) Chemical Passage Detection (CPD) and (iii) Gene and Protein Related Object task (GPRO). We participated in the two subtasks of CEMP and CPD using machine learning-based systems. Our machine learning-based systems employed the algorithms of conditional random fields (CRF) and structured support vector machines (SSVMs), respectively. To improve the performance of the NER systems, two strategies were proposed for feature engineering: (i) domain knowledge features of dictionaries, chemical structural patterns and semantic type information present in the context of the candidate chemical and (ii) unsupervised feature learning algorithms to generate word representation features by Brown clustering and a novel binarized Word embedding to enhance the generalizability of the system. Further, the system output for the CPD task was yielded based on the patent titles and abstracts with chemicals recognized in the CEMP task.The effects of the proposed feature strategies on both the machine learning-based systems were investigated. Our best system achieved the second best performance among 21 participating teams in CEMP with a precision of 87.18%, a recall of 90.78% and aF-measure of 88.94% and was the top performing system among nine participating teams in CPD with a sensitivity of 98.60%, a specificity of 87.21%, an accuracy of 94.75%, a Matthew's correlation coefficient (MCC) of 88.24%, a precision at full recall (P_full_R) of 66.57% and an area under the precision-recall curve (AUC_PR) of 0.9347. The SSVM-based CEMP systems outperformed the CRF-based CEMP systems when using the same features. Features generated from both the domain knowledge and unsupervised learning algorithms significantly improved the chemical NER task on patents.Database URL:http:// database. oxfordjournals. org/ content/ 2016/ baw049.
Published in 2016
READ PUBLICATION →

ChemProt-3.0: a global chemical biology diseases mapping.

Authors: Kringelum J, Kjaerulff SK, Brunak S, Lund O, Oprea TI, Taboureau O

Abstract: ChemProt is a publicly available compilation of chemical-protein-disease annotation resources that enables the study of systems pharmacology for a small molecule across multiple layers of complexity from molecular to clinical levels. In this third version, ChemProt has been updated to more than 1.7 million compounds with 7.8 million bioactivity measurements for 19,504 proteins. Here, we report the implementation of global pharmacological heatmap, supporting a user-friendly navigation of chemogenomics space. This facilitates the visualization and selection of chemicals that share similar structural properties. In addition, the user has the possibility to search by compound, target, pathway, disease and clinical effect. Genetic variations associated to target proteins were integrated, making it possible to plan pharmacogenetic studies and to suggest human response variability to drug. Finally, Quantitative Structure-Activity Relationship models for 850 proteins having sufficient data were implemented, enabling secondary pharmacological profiling predictions from molecular structure. Database URL: http://potentia.cbs.dtu.dk/ChemProt/.
Published in 2016
READ PUBLICATION →

COLLECTIVE PAIRWISE CLASSIFICATION FOR MULTI-WAY ANALYSIS OF DISEASE AND DRUG DATA.

Authors: Zitnik M, Zupan B

Abstract: Interactions between drugs, drug targets or diseases can be predicted on the basis of molecular, clinical and genomic features by, for example, exploiting similarity of disease pathways, chemical structures, activities across cell lines or clinical manifestations of diseases. A successful way to better understand complex interactions in biomedical systems is to employ collective relational learning approaches that can jointly model diverse relationships present in multiplex data. We propose a novel collective pairwise classification approach for multi-way data analysis. Our model leverages the superiority of latent factor models and classifies relationships in a large relational data domain using a pairwise ranking loss. In contrast to current approaches, our method estimates probabilities, such that probabilities for existing relationships are higher than for assumed-to-be-negative relationships. Although our method bears correspondence with the maximization of non-differentiable area under the ROC curve, we were able to design a learning algorithm that scales well on multi-relational data encoding interactions between thousands of entities.We use the new method to infer relationships from multiplex drug data and to predict connections between clinical manifestations of diseases and their underlying molecular signatures. Our method achieves promising predictive performance when compared to state-of-the-art alternative approaches and can make "category-jumping" predictions about diseases from genomic and clinical data generated far outside the molecular context.
Published in 2016
READ PUBLICATION →

In silico profiling for secondary metabolites from Lepidium meyenii (maca) by the pharmacophore and ligand-shape-based joint approach.

Authors: Yi F, Tan XL, Yan X, Liu HB

Abstract: BACKGROUND: Lepidium meyenii Walpers (maca) is an herb known as a traditional nutritional supplement and widely used in Peru, North America, and Europe to enhance human fertility and treat osteoporosis. The secondary metabolites of maca, namely, maca alkaloids, macaenes, and macamides, are bioactive compounds, but their targets are undefined. METHODS: The pharmacophore-based PharmaDB targets database screening joint the ligand shape similarity-based WEGA validation approach is proposed to predict the targets of these unique constituents and was performed using Discovery Studio 4.5 and PharmaDB. A compounds-targets-diseases network was established using Cytoscape 3.2. These suitable targets and their genes were calculated and analyzed using ingenuity pathway analysis and GeneMANIA. RESULTS: Certain targets were identified in osteoporosis (8 targets), prostate cancer (9 targets), and kidney diseases (11 targets). This was the first study to identify the targets of these bioactive compounds in maca for cardiovascular diseases (29 targets). The compound with the most targets (46) was an amide alkaloid (MA-24). CONCLUSION: In silico target fishing identified maca's traditional effects on treatment and prevention of osteoporosis, prostate cancer, and kidney diseases, and its potential function of treating cardiovascular diseases, as the most important of this herb's possible activities.
Published in 2016
READ PUBLICATION →

PubMedPortable: A Framework for Supporting the Development of Text Mining Applications.

Authors: Doring K, Gruning BA, Telukunta KK, Thomas P, Gunther S

Abstract: Information extraction from biomedical literature is continuously growing in scope and importance. Many tools exist that perform named entity recognition, e.g. of proteins, chemical compounds, and diseases. Furthermore, several approaches deal with the extraction of relations between identified entities. The BioCreative community supports these developments with yearly open challenges, which led to a standardised XML text annotation format called BioC. PubMed provides access to the largest open biomedical literature repository, but there is no unified way of connecting its data to natural language processing tools. Therefore, an appropriate data environment is needed as a basis to combine different software solutions and to develop customised text mining applications. PubMedPortable builds a relational database and a full text index on PubMed citations. It can be applied either to the complete PubMed data set or an arbitrary subset of downloaded PubMed XML files. The software provides the infrastructure to combine stand-alone applications by exporting different data formats, e.g. BioC. The presented workflows show how to use PubMedPortable to retrieve, store, and analyse a disease-specific data set. The provided use cases are well documented in the PubMedPortable wiki. The open-source software library is small, easy to use, and scalable to the user's system requirements. It is freely available for Linux on the web at https://github.com/KerstenDoering/PubMedPortable and for other operating systems as a virtual container. The approach was tested extensively and applied successfully in several projects.
Published in 2016
READ PUBLICATION →

BioTriangle: a web-accessible platform for generating various molecular representations for chemicals, proteins, DNAs/RNAs and their interactions.

Authors: Dong J, Yao ZJ, Wen M, Zhu MF, Wang NN, Miao HY, Lu AP, Zeng WB, Cao DS

Abstract: BACKGROUND: More and more evidences from network biology indicate that most cellular components exert their functions through interactions with other cellular components, such as proteins, DNAs, RNAs and small molecules. The rapidly increasing amount of publicly available data in biology and chemistry enables researchers to revisit interaction problems by systematic integration and analysis of heterogeneous data. Currently, some tools have been developed to represent these components. However, they have some limitations and only focus on the analysis of either small molecules or proteins or DNAs/RNAs. To the best of our knowledge, there is still a lack of freely-available, easy-to-use and integrated platforms for generating molecular descriptors of DNAs/RNAs, proteins, small molecules and their interactions. RESULTS: Herein, we developed a comprehensive molecular representation platform, called BioTriangle, to emphasize the integration of cheminformatics and bioinformatics into a molecular informatics platform for computational biology study. It contains a feature-rich toolkit used for the characterization of various biological molecules and complex interaction samples including chemicals, proteins, DNAs/RNAs and even their interactions. By using BioTriangle, users are able to start a full pipelining from getting molecular data, molecular representation to constructing machine learning models conveniently. CONCLUSION: BioTriangle provides a user-friendly interface to calculate various features of biological molecules and complex interaction samples conveniently. The computing tasks can be submitted and performed simply in a browser without any sophisticated installation and configuration process. BioTriangle is freely available at http://biotriangle.scbdd.com.Graphical abstractAn overview of BioTriangle. A platform for generating various molecular representations for chemicals, proteins, DNAs/RNAs and their interactions.
Published in 2016
READ PUBLICATION →

Lead identification for the K-Ras protein: virtual screening and combinatorial fragment-based approaches.

Authors: Pathan AA, Panthi B, Khan Z, Koppula PR, Alanazi MS, Sachchidanand, Parine NR, Chourasia M

Abstract: OBJECTIVE: Kirsten rat sarcoma (K-Ras) protein is a member of Ras family belonging to the small guanosine triphosphatases superfamily. The members of this family share a conserved structure and biochemical properties, acting as binary molecular switches. The guanosine triphosphate-bound active K-Ras interacts with a range of effectors, resulting in the stimulation of downstream signaling pathways regulating cell proliferation, differentiation, and apoptosis. Efforts to target K-Ras have been unsuccessful until now, placing it among high-value molecules against which developing a therapy would have an enormous impact. K-Ras transduces signals when it binds to guanosine triphosphate by directly binding to downstream effector proteins, but in case of guanosine diphosphate-bound conformation, these interactions get disrupted. METHODS: In the present study, we targeted the nucleotide-binding site in the "on" and "off" state conformations of the K-Ras protein to find out suitable lead compounds. A structure-based virtual screening approach has been used to screen compounds from different databases, followed by a combinatorial fragment-based approach to design the apposite lead for the K-Ras protein. RESULTS: Interestingly, the designed compounds exhibit a binding preference for the "off" state over "on" state conformation of K-Ras protein. Moreover, the designed compounds' interactions are similar to guanosine diphosphate and, thus, could presumably act as a potential lead for K-Ras. The predicted drug-likeness properties of these compounds suggest that these compounds follow the Lipinski's rule of five and have tolerable absorption, distribution, metabolism, excretion and toxicity values. CONCLUSION: Thus, through the current study, we propose targeting only "off" state conformations as a promising strategy for the design of reversible inhibitors to pharmacologically inhibit distinct conformations of K-Ras protein.
Published in 2016
READ PUBLICATION →

The drug-minded protein interaction database (DrumPID) for efficient target analysis and drug development.

Authors: Kunz M, Liang C, Nilla S, Cecil A, Dandekar T

Abstract: The drug-minded protein interaction database (DrumPID) has been designed to provide fast, tailored information on drugs and their protein networks including indications, protein targets and side-targets. Starting queries include compound, target and protein interactions and organism-specific protein families. Furthermore, drug name, chemical structures and their SMILES notation, affected proteins (potential drug targets), organisms as well as diseases can be queried including various combinations and refinement of searches. Drugs and protein interactions are analyzed in detail with reference to protein structures and catalytic domains, related compound structures as well as potential targets in other organisms. DrumPID considers drug functionality, compound similarity, target structure, interactome analysis and organismic range for a compound, useful for drug development, predicting drug side-effects and structure-activity relationships.Database URL:http://drumpid.bioapps.biozentrum.uni-wuerzburg.de.