spacer
spacer

2Can Support Portal - Nucleotide Analysis


Checking for vector contamination - Introduction


DNA/RNA from a biological source are usually inserted into a cloning vector (e.g. plasmid or phage) so that they can be cloned. Sequencing of such constructs frequently produces raw sequences that include segments derived from vectors. Also as part of the cloning or amplification process, oligonucleotides can be attached DNA/RNA under investigation and sequences of these oligonucleotides are often present in raw sequences and will contaminate the finished sequence unless they are identified and removed. Also transposable elements from the cloning host (generally bacteria or yeast) can insert itself into the cloned DNA/RNA while the clone is being propagated and will then be sequenced, as can DNA/RNA contaminants.

More than 200 reports can be found in the public literature on the subject of contamination by vector sequences of the major sequence databases. All of these reports try to alert the scientific community of the potential pitfalls associated. Vector sequences can be found in the sequence databanks for various reasons. The most common cause is that the submitters forget to remove them. Another is that they are submitted to the databanks because they are, after all, vectors which others might find useful.

The publication of a newly discovered gene or gene fragment requires that people submit the sequence to the public databanks in order to obtain an accession number. This number is unique for each submission and helps scientists identify their sequence amongst the myriad of sequences being produced each day by the many world-wide sequencing efforts. Furthermore, without an accession number, there can be no publication.

Scientists are often in a hurry to submit their sequences and innocently forget to remove a crucial part of the cloning vector they used to obtain it. This is frequently the poly-linker and inserts related to it. Other parts of cloning vectors are also to be found in eukaryotic sequence submissions, which accidentally, have made their way into an otherwise genuine gene. In general this implies that some sort of rearrangement has taken place during the cloning experiments. Accidents do happen and the more we sequence the more submissions with vector contamination will occur.

In an effort to assist the submitters, the EBI is now providing this Vector Screening Service called BLAST2 EVEC. This is based on NCBI BLAST2 and uses the latest implementation of the BLAST algorithm and a special sequence databank known as EMVEC to check your sequences for vector contamination.

EMVEC is an extraction of sequences from the synthetic division of EMBL containing over 2000 sequences commonly used in cloning and sequencing experiments. EMVEC is by no means a complete vector databank but the EBI believes it is representative of the kind of material used in modern sequencing and should be useful to submitters.

How to run a check for vector contamination >>>

<<< Previous || Start of Lesson || Next >>>

spacer
spacer