SMSD is a Java based software library for calculating Maximum Common Subgraph (MCS) between small molecules. This software can calculate the similarity between two small molecules by using an in house algorithm developed at EMBL-EBI.
Platform independent tool: Works on all java compatible OS (Linux, Windows, MacOS).
Here are three ways you can use this software library:
a) As a GUI tool.
a) SMSD GUI
To run the project from the command line, double click on the SMSD.jar file or use the following command on the shell:
java -jar "SMSD.jar"
To distribute this project, zip up the dist folder (including the lib folder) and distribute the ZIP file.
b) Command line options
i) Windows use SMSD.bat
SMSD matches various filetypes, to see a list of supported types, run SMSD with "-h". At the end of the help, the list of types is shown, along with a description for each type. The type is a short identifier ('MOL', 'PDB', etc) that is used to tell SMSD what to expect. The query and the target files can have different types, for example:
./SMSD -Q MOL -q molfile.mol -T PDB -t pdbfile.pdb
Where the uppercase flags (-Q and -T) give the types, and the lowercase flags (-q and -t) give the filenames. For 'string' types - such as SMILEs - the filename will be the data itself:
./SMSD -Q SMI -q "CCC" -T PDB -t pdbfile.pdb
Note that, while the quotes may not always be necessary, they will prevent problems with more complex SMILEs.
Types can also be used with the output subgraph. There is a corresponding pair of -o/-O flags for the output filepath, and output filetype, respectively. So, to write the subgraph to a molfile, write:
./SMSD -Q SMI -q "CCC" -T PDB -t ATP.pdb -O MOL -o subgraph.mol
For convenience, the output filepath can be given as the special name "--", which means "write to stdout". This is a quick way to see the subgraph, especially, if the output filetype is SMI.
./SMSD -Q SMI -q "CCC" -T SMI -t "CCN" -O SMI -o --
This will just print "CC", as that is the common subgraph of these two smiles.
To generate an image of the isomorphism, use the -g flag, like this:
./SMSD -Q MOL -q ADP.mol -T MOL -t ATP.mol -g
This will generate an image named "ADP_ATP.png" looking something like:
Clearly the name of the image is generated using the names of the query and target input files. If string format molecules are used, the string will be used; for example, "CCC_CCCC.png". The size of the image can be changed with the -d flag, like : -d 400x200 to create an image with a width of 400 and a height of 200.
Image options can also be passed at the command-line using the syntax "-Iopt=value". For example, "-IdrawAromaticCircles=true". To get a list of the options, along with their default values, just use the -I flag without any arguments.
To get the maximum common subgraph (MCS) of a set of molecules, provide only a target file, which must be a multi-file format such as SDF. As an example:
./SMSD -T SDF -t arom.sdf -N -g
This will produce a 'hub-wheel' image of the MCS that looks like:
where the central (hub) molecule is the MCS of the molecules around the rim of the wheel.
Just use ./SMSD -I to list all the image options.
Usage and command line options:
Allowed types for single-molecules (query):
Allowed types for multiple-molecules (targets only):
NOTE: Remove hydrogens before performing graph matching.
3) Classes and Methods Summery