UNICHEM
UniChem
spacer

Connectivity Search.

UniChem can be used to search for molecules related by common connectivity.



Read More

Molecules with the same atom connectivity but differing in stereochemistry or isotopic composition are identical in the 'connectivity layers' of their InChIs (and in the first hash block of their InChIKeys). UniChem exploits this property of InChIs to allow the user to easiy search for molecules with a common connectivity. Background and documentation for Connectivity Searching in UniChem is here.

To run a Connectivity Search, enter either structures (InChIKeys) or src_compound_id's from various UniChem sources into the form below. Some example queries are given at the foot of the page to help you get started.

Queries return a list of src_compound_id assignments together with a comparison of the InChI layers of the query and matched InChIs, and the 'relationship' of the Query to the retrieved src_compound_id.

Note that only single InChIKeys and src_compound_ids may be used for querying with this interface. For multiple queries the user should use the 'key_search' and 'cpd_search' web service methods which perform the same queries.

Note also that you can modify the Search criteria to run more sophisticated queries than is provided by default. This is achieved by selecting from the options in the drop down panel immediately beneath the 'Search' button.



Run a Connectivity Search...
Query term:

InChIKey

src_compound_id
src_compound_id source:
[ Info ]
When searching with src_compound_ids you must specify the source of the src_compound_id that you are referring to.
... or, click here and refine your query first by changing some of the options below from their defaults...
------------------------------------------------------------------------------------------------------------------------------------

A. Sources...

[ A Info ]
Criteria 'A' allows you to narrow the results of your search down to one particular source. Leaving this as the default ('All') will result in all sources being searched. Go here for a full explanation of Criteria 'A'.
 
------------------------------------------------------------------------------------------------------------------------------------

B. Pattern...

[ B info ]

Connectivity

Full InChIKey
minus proton flag
Criteria 'B' allows you to define the 'pattern' that should be used in searching.
0 = Match on Connectivity layer (Will usually return far more than '1')
1 = Match on Connectivity layer and Non-connectivity layer (ie: the full Standard InChiKey minus the proton flag)
Note that option '1' can be used even if querying with a connectivity layer only. Go here for a full explanation of Criteria 'B'.
------------------------------------------------------------------------------------------------------------------------------------

C. Component Mapping...

[ C info ]

0

1

2

3

4
...matches... ...matches components of... ...has components matching... ...has components matching components of... Run all 0-3
Criteria 'C' allows you to search using various Component Mappings between InChIs.
0 = search ONLY for Component Mapping of the type '...matches...'
1 = search ONLY for Component Mapping of the type '...matches components of...'
2 = search ONLY for Component Mapping of the type '...has components matching...'
3 = search ONLY for Component Mapping of the type '...has components matching components of...'
4 = search for ALL FOUR of the above Component Mappings (0-3) simultaneously.
Go here for a full explanation of Criteria 'C'.
------------------------------------------------------------------------------------------------------------------------------------
More Options...
------------------------------------------------------------------------------------------------------------------------------------

D. Frequency Block...

[ D. info ]
Blocking Frequency(1 - 500)
Criteria 'D' Blocks sub-queries C1 and C3 for a given a single-component InChI on the basis of the frequency of occurrence of this single-component InChI in multiple component InChIs in UniChem. The default is ‘200’ and this may be defined by entering either ‘200’ or ‘0’ or nothing, but may be specified by the user as any value between 1 and 500. Go here for a full explanation of Criteria 'D'.
------------------------------------------------------------------------------------------------------------------------------------

E. InChI length block...

[ E. info ]
1 - 2000
Criteria 'E' allows you to block sub-queries C1 and C3 for a given a single-component InChI on the basis of the length of the Std InChI up to the end of the connection layer of the InChI. The default is to not use this block at all, and this is specified with a ‘0’ (or by nothing) This value may be set from anywhere between 1 and 2000.

Note that this block cannot override the mandatory block imposed by criterion D. Go here for a full explanation of Criteria 'E'.

------------------------------------------------------------------------------------------------------------------------------------

F. UniChem Labels...

[ F. info ]

0

1
Use Don't
(Default) use
Criteria 'F' allows you to make use of UniChem's annotation labels for connectivity components. This is turned on by default ('0') resulting in the 'label' field of the output being populated with UniChem labels.

Selecting ('1') for this option switches this behaviour off, and provides a small performance advantage. Go here for a full explanation of Criteria 'F'. Go here for a listing of currently used UniChem labels.

------------------------------------------------------------------------------------------------------------------------------------

G. Assignments...

[ G. info ]

0

1
Current only Current AND
(default) Obsolete
Criteria 'G' allows you to retrieve Obsolete assignments as well as current. By default (0), only currently assigned records are retrieved. However, setting this criterion to ‘1’ will result in both current and obsolete assignments being retrieved. Go here for a full explanation of Criteria 'G'.

------------------------------------------------------------------------------------------------------------------------------------


Example Queries

Some commonly occurring structures (multiple and single component InChIs)...

  • CHEMBL15245 ... a CHEMBL (src_id 1) src_compound_id for Yohimbine.
  • CHEMBL2097107 ... ChEMBL src_compound_id for AMOXYCILLIN and CLAVULANIC ACID. Setting criterion 'G' to '1' will retrieve CHEMBL8156, an obsolete ChEMBL Id with alternative stereochemistry.
  • QJVHTELASVOWBE-AGNWQMPPSA-N ... the InChI key for CHEMBL2097107

...Explanatory notes on these examples

An example of a src_compound_id (a patent id [src_id 13]) assigned to multiple InChIs...

  • US5244668

...Explanatory notes on this example

InChIKeys which by themselves do not exist in UniChem...

  • PQLXHQMOHUQAKB-UHFFFAOYSA-P ...but shares connectivity with a structure in UniChem, and UniChem can automatically calculate the InChI and proceed with the query.
  • CTMTYSVTTGVYAW-FRRDWIJNSA-N ...but shares connectivity with a structure in UniChem (Note invitation to re-query)
  • ONNGJOMABHCYGX-HEFFQPSWSA-N ...but shares connectivity with components of a structure in UniChem (Note invitation to re-query, set criterion C to '4').
  • TWRWROOHGNQOQC-XRIOVQLTSA-L ...and does not share connectivity with anything in UniChem.

...Explanatory notes on these examples

BMB EU-OPENSCREEN
spacer
spacer