Sequence similarity search
BLAST stands for Basic Local Alignment Search Tool. It is used to compare a novel sequence with those contained in nucleotide and protein databases by aligning the novel sequence with previously characterised genes. The emphasis of this tool is to find regions of sequence similarity, which will yield functional and evolutionaryclues about the structure and function of this novel sequence. Regions of similarity detected via this type of alignment tool can be either local, where the region of similarity is based in one location, or global, where regions of similarity can be detected across otherwise unrelated genetic code.
Help with IPD-MHC BLAST
The EBI BLAST engine is now automatically configured for IPD-MHC searches when you access it directly from this page. If you are accessing the page from another location or wish to alter the parameters provided please read the guidelines below.
- The IPD-MHC BLAST will perform a blast query by using the EBI BLAST REST service
- By default search will be performed on the IPD-MHC and IMGT/HLA databases
- The sequence can contain spaces but should not contain numbers or any form of formatting i.e. asterisks (*) or periods (.) as BLAST considers these invalid nucleotides. The IUB (International Union of Biochemists) codes are acceptable in a BLAST search, however, if the sequence contains more than 50% ambiguity codes it will mostly likely be rejected, but if successful may contain false positive hits which are of limited use.
- The BLAST engine is designed for searching for sequence similarities over large sequences. Searching the databases with short sequences may result in an error. The minimum length is 11 bases, however the recommended minimum sequence length is 22 bases (nucleotides/blastn) or 6 amino acids (proteins/blastp). Some searches under this recommended size may run but even single mismatches can cause the search to fail.
- The BLAST scoring system can sometimes distort the results for example a 546 base pair sequence of ~95% identity may score higher than a shorter (270 bps for example) sequence of 100% identity, if the poiymorphisms occur within a few bases from the start or end of the library sequence. Therefore the top results may not always be the best match. This is due to the high degree of similarity between sequences