Data Sources.

UniChem currently contains data from the sources listed below. Follow the links on the short names for more detailed information on each source...

src_id Short name Full name Description Process of Data Acquisition
1 chembl ChEMBL A database of bioactive drug-like small molecules and bioactivities abstracted from the scientific literature. Standard InChIs and Keys provided on ftp site for each release.
2 drugbank DrugBank A database that combines drug (i.e. chemical, pharmacological and pharmaceutical) data with drug target (i.e. sequence, structure, and pathway) information. Standard InChIs and Keys provided within sd file on ftp site for each release.
3 pdb PDBe (Protein Data Bank Europe) The European resource for the collection, organisation and dissemination of data on biological macromolecular structures, including structures of small molecule ligands for proteins. Standard InChIs and Keys provided by direct querying of Oracle DB.
4 gtopdb Guide to Pharmacology The IUPHAR (International Union of Basic and Clinical Pharmacology)/BPS (British Pharmacological Society) Guide to PHARMACOLOGY database contains structures of small molecule ligands, peptides and antibodies, with their affinities at protein targets. Standard InChIs and Keys available for download at http://www.guidetopharmacology.org/download.jsp
5 pubchem_dotf PubChem ('Drugs of the Future' subset) A subset of the PubChem DB: from the original depositor 'drugs of the future' (Prous). Mol files for SIDs downloaded manually, via PubChem interface, and Standard InChIs and Keys generated by InChI software. SIDs used as identifiers.
6 kegg_ligand KEGG (Kyoto Encyclopedia of Genes and Genomes) Ligand KEGG LIGAND is a composite DB consisting of COMPOUND, GLYCAN, REACTION, RPAIR, RCLASS, and ENZYME DBs, whose entries are identified by C, G, R, RP, RC, and EC numbers, respectively. Mol files were downloaded manually prior to this download becoming private. Standard InChIs and Keys generated by InChI software.
7 chebi ChEBI (Chemical Entities of Biological Interest). ChEBI is a freely available dictionary of molecular entities focused on 'small' chemical compounds Std InChis (but no keys) provided on ftp site. Keys generated by UniChem. 'all star' compounds downloaded
8 nih_ncc NIH Clinical Collection Collections of plated arrays of small molecules that have a history of use in human clinical trials. Assembled by the National Institutes of Health (NIH) through the Molecular Libraries Roadmap Initiative Mol files downloaded manually and Standard InChIs and Keys generated by InChI software
9 zinc ZINC A free database of commercially-available compounds for virtual screening, provided by the Shoichet Laboratory in the Department of Pharmaceutical Chemistry at the University of California, San Francisco (UCSF). [Irwin and Shoichet, J. Chem. Inf. Model. 2005;45(1):177-82] novirtual subset of ZINC15, as a file containing inchis and keys from http://files.docking.org/export/unichem/
10 emolecules eMolecules A free chemical structure search engine containing millions of public domain structures. Pricing, availabilities, and vendor information requires an eMolecules Plus subscription. Downloaded as an SD file from source, Converted to InChI and INChIKeys by UniChem
11 ibm IBM strategic IP insight platform and the National Institutes of Health The data are provided by IBM-NIH and include all chemistry extracted by means of text and image mining from the patent corpus (USPTO, WIPO and EPO) for patent documents published through 31-12-2010. Identifiers in UniChem are IBM compound identifiers. InChIs and InChI keys were generated from SMILES in house.
12 atlas Gene Expression Atlas The Gene Expression Atlas is a semantically enriched database of meta-analysis based summary statistics over a curated subset of ArrayExpress Archive, servicing queries for condition-specific gene expression patterns as well as broader exploratory searches for biologically interesting genes/samples. Currently extracted from compound names.
14 fdasrs FDA/USP Substance Registration System (SRS) The primary goal of the FDA/USP Substance Registration System (SRS) is to unambiguously define all substances present in regulated products. Once a substance has been defined, the SRS assigns a strong identifier that is permanently associated with the substance: a UNII (Unique Ingredient Identifier). This is a a non-proprietary, free, unique, unambiguous, nonsemantic, alphanumeric identifier based on a substances molecular structure and/or descriptive information. Download of InChIKeys in file UNII Data from http://fdasis.nlm.nih.gov/srs/jsp/srs/uniiListDownload.jsp
15 surechembl SureChEMBL SureChEMBL automatically extracts chemistry from the full text of all major patent authorities. Compounds are derived from either chemical names found in text or in chemical depictions. All SureChEMBL compounds are included, except those failing UniChem loading rules. Standard InChIs and InChIKeys provided by a direct feed from the SureChEMBL database
17 pharmgkb PharmGKB PharmGKB (Pharmacogenomics Knowledgebase) is a comprehensive resource that curates knowledge about the impact of genetic variation on drug response for clinicians and researchers. drugs.zip file from from download site https://www.pharmgkb.org/downloads/. Smiles contained in this file Converted to Std InChIs internally
18 hmdb Human Metabolome Database (HMDB) The Human Metabolome Database (HMDB) is a freely available electronic database containing detailed information about small molecule metabolites found in the human body. It is intended to be used for applications in metabolomics, clinical chemistry, biomarker discovery and general education. The database is designed to contain or link three kinds of data: 1) chemical data, 2) clinical data, and 3) molecular biology/biochemistry data SD file download from source, converted to InChIs within UniChem
20 selleck Selleck Selleck Chemicals is a supplier of biochemical products, including over 1,000 inhibitor products SD file provided by email. InChIs generated by UniChem
21 pubchem_tpharma PubChem ('Thomson Pharma' subset) A subset of the PubChem DB: from the original depositor 'Thomson Pharma'. Mol files for SIDs downloaded manually, via PubChem interface, and Standard InChIs and Keys generated by InChI software. SIDs used as identifiers.
22 pubchem PubChem Compounds A database of normalized PubChem compounds (CIDs) from the PubChem Database. Standard InChIs and Keys provided on ftp site.
23 mcule Mcule An online drug discovery platform with virtual screening and molecular modelling services. Standard InChIs and Keys provided by email.
24 nmrshiftdb2 NMRShiftDB An NMR database (web database) for organic structures and their nuclear magnetic resonance (nmr) spectra. It allows for spectrum prediction (13C, 1H and other nuclei) as well as for searching spectra, structures and other properties. Last not least, it features peer-reviewed submission of datasets by its users. Standard InChI and Keys available for download at http://nmrshiftdb.nmr.uni-koeln.de/nmrshiftdb2unichem.txt
25 lincs Library of Integrated Network-based Cellular Signatures The LINCS DCIC facilitates and standardized the information relevant to LINCS assays as described in http://www.lincsproject.org/data/data-standards/ Standard InChIs and Keys downloadable fromhttp://lincs-dcic.org/metadata/SmallMolecules
26 actor ACToR ACToR (Aggregated Computational Toxicology Resource) Standard InChIs and Keys generated from SMILES from DB download
27 recon Recon A biochemical knowledge-base on human metabolism Standard InChIs and Keys provided by email
28 molport MolPort MolPort. A database designed to assist users find commercial sources of compounds. Access requires (free) registration. Only stock compounds included from Nov 2017. Standard InChIs and Keys provided on from MolPort ftp site, access on request.
29 nikkaji Nikkaji Nakkaji (The Japan Chemical Substance Dictionary) is an organic compound dictionary database prepared by the Japan Science and Technology Agency (JST). Standard InChIs and Keys available from ftp site
31 bindingdb BindingDB A public, web-accessible database of measured binding affinities, focusing chiefly on the interactions of proteins considered to be drug-targets with small, drug-like molecules Standard InChIs and Keys available within a tsv file from download page.
32 comptox EPA (Environmental Protection Agency) CompTox Dashboard The foundation of chemical safety testing relies on chemistry information such as high-quality chemical structures and physicochemical properties. This information is used by scientists to predict the potential health risks of chemicals.The CompTox Dashboard is part of a suite of dashboards developed by EPA to help evaluate the safety of chemicals. It provides access to a variety of data and information on over 700,000 chemicals currently in use and of interest to environmental researchers. Within the CompTox Dashboard, users can access chemical structures, experimental and predicted physicochemical and toxicity data, and additional links to relevant websites and applications. It maps curated physicochemical property data associated with chemical substances to their corresponding chemical structures Standard InChIs and Keys obtained from download page
33 lipidmaps LipidMaps LIPID Metabolites And Pathways Strategy (LIPID MAPS) is a multi-institutional effort created to identify and quantitate, using a systems biology approach and sophisticated mass spectrometers, all of the major, and many minor, lipid species in mammalian cells, as well as to quantitate the changes in these species in response to perturbation Standard InChIs and Keys obtained from download page
34 drugcentral DrugCentral DrugCentral is an online drug information resource created and maintained by Division of Translational Informatics at University of New Mexico, providing information on active ingredients chemical entities, pharmaceutical products, drug mode of action, indications, pharmacologic action Standard InChIs and Keys available as a download.
35 carotenoiddb Carotenoid Database A Database of information on naturally occurring carotenoids from many organisms, extracted from the literature. Standard InChIs and Keys available as a download
36 metabolights Metabolights A database for Metabolomics experiments and derived information. The database is cross-species, cross-technique and covers metabolite structures and their reference spectra as well as their biological roles, locations and concentrations, and experimental data from metabolic experiments. Standard InChIs and Keys available from FTP site
37 brenda Brenda A comprehensive Enzyme Information system containing enzyme functional data extracted directly from the primary literature. Standard InChIKeys available as a download
38 rhea Rhea An expert curated resource of biochemical reactions designed for the annotation of enzymes and genome-scale metabolic networks and models src_compound_ids extracted from ChEBI download file.

