0%

Sequence search

MGnify maintains a non-redundant database of predicted proteins obtained from the analysis of assemblies. The sequences can be searched on the MGnify website via the ‘Sequence search’ tab (Figure 12A).

Users can enter a protein query sequence in FASTA format sequence, and choose the database they wish to search against. It is possible to filter the sequence database to be searched against in terms of either sequence-type (full-length, partial, or all proteins), or the biome of the source study. Having specified the parameters and run the query, the user is presented with a table listing the matching proteins, with the corresponding E-value. Using the ‘Customise’ button, the results can be reformatted to include, for example, the bit-score for the match, a graphical representation of the match positions, a link to the corresponding protein in UniProt (if such a protein exists), and the samples and runs from which the protein is derived (Figure 12B). Full documentation on the sequence search facility can be found here.

Figure 12 Sequence search: the MGnify sequence search allows users to query the MGnify non-redundant protein database (generated from assembly analyses) with a FASTA sequence. (A) Users can specify the underlying protein set to be queried against using a series of sequence database filters including sequence-type, or source biome. Users can also specify their own E-value/bit score to use as a threshold. (B) Having submitted the query, a table of results is returned with the list of matches to the specified set. Users can customise the information included in this table, and also download the information in a variety of machine-readable formats.