|
Sequin is a program designed to aid in the submission
of sequences to the GenBank, EMBL, and DDBJ sequence
databases. It was written at the National Center for
Biotechnology Information, part of the National Library
of Medicine at the National Institutes of Health. This
section of the help document provides a basic overview
of how to submit sequences using the Sequin forms. Subsequent
sections provide detailed instructions for entering
information on each form.
The Sequin help documentation is available in both
on-line and a World Wide Web (http://www.ncbi.nlm.nih.gov/Sequin/sequin.hlp.html)
formats. The text of the on-line version scrolls as
you progress through the Sequin forms. Specific words
or phrases can be identified with the "find" command
at the top of the window. The on-line document can also
be saved as a text file, or printed directly to a printer.
Click on the window which contains the help documentation.
Under the Sequin File menu, choose Export Help... to
save the documentation as a text file. To print the
documentation without saving it first, click on the
help window, and choose Print from the Sequin File menu.
Information is entered into Sequin on a number of different
forms. Each form is made up of pages which are indicated
by folder tabs at the top of the form. You can move
to the desired page by clicking on the appropriate folder
tab. You can also move between pages of a form by clicking
on the "Next page" or "Prev page" buttons at the bottom
of the screen. You can move to the previous form or
the next form by clicking on the "Prev form" or "Next
form" buttons on the first or last pages of a form,
respectively.
There are two levels of folder tabs. Tabs with large
bold lettering indicate pages which should be filled
out for every entry. Tabs with smaller lighter lettering
indicate minor pages which provide access to infrequently
used parameters.
There are numerous ways to enter information onto a
page of a form. Many of these, such as text fields,
radio buttons, check boxes, and scrolling boxes, are
standard in other computer programs and will not be
described here. Sequin does employ two less standard
data entry options, pop-up menus and spreadsheets.
Pop-up menus: When the mouse is clicked in one of these
menus, a list of choices is displayed. Select the correct
option by moving the mouse to that option and letting
go. Only one option can be selected.
Spreadsheets: These are fields which change their size
in response to the amount of information entered. After
you input information into a field, another field will
appear in which you can enter additional information
if necessary.
[Top of Page]
If you are using Sequin for the first time, you will
be prompted to fill out three forms: the Welcome to
Sequin form, the Submitting Authors Form, the Sequence
Format form, and the Organism and Sequences Form. After
you have filled out these forms, a window will appear
which contains the Sequin record viewer. This viewer
allows you to access many other forms in which you can
edit fields filled out in the three initial forms, as
well as add additional information you feel should be
included in the submission. Detailed instructions on
how to fill out the forms and use the record viewer
is presented below.
This window allows you to choose the type of project
you want to work on. First, indicate, with one of the
three radio buttons, whether you are submitting the
sequence to the GenBank, EMBL, or DDBJ database. If
you are working on a sequence submission for the first
time, click on "Start New Submission." If you are modifying
an existing submission record, click on "Read Existing
Record." If you would like to quit from Sequin, click
on "Quit Program."
You can also "Read Existing Record" to read in a FASTA-formatted
sequence file for analysis purposes. The sequence will
be displayed in Sequin and can be analysed with tools
such as PowerBLAST, but it should not be submitted,
because it does not have the appropriate annotations.
If you are running Sequin in its network-aware mode,
you will see another button labelled "Download from
Entrez." This option allows you to update an existing
database record using Sequin. The record will be downloaded
from GenBank into Sequin using the NCBI's Entrez retrieval
system. The contents of the record will appear in Sequin,
and you can edit them by updating the sequence or the
annotations, as necessary. If you do not see the button
labelled "Download from Entrez" on the Welcome to Sequin
form, you are not running Sequin in its network-aware
mode. To make Sequin network-aware, see the
instructions later on in the help documentation.
Please note that, at present, you can update only those
records which you have previously submitted yourself.
To update an existing record, first select which of
the databases you will be sending the update to. This
should be the database to which the original record
was submitted. If you do not know which database to
use, send the record to GenBank and the NCBI staff will
forward it to the appropriate database. Next, click
on the button "Download from Entrez." Enter the accession
number or GI of the sequence on the first form. Then
enter "yes" if you are planning to submit the record
as an update to one of the databases. Fill out the Submitting
Authors form. Instructions
for this form are found in the Sequin help documentation
under "Edit Submitter Info" under the Sequin File menu.
The record will then open in the Sequin. Explanations
of how to add annotations or update sequences are presented
in the documentation entitled
"Editing the record" and
"Sequence Editor" , respectively. You will not
see the Submitting Authors Form, the Sequence Format
Form, or the Organism and Sequences Form. Note that
updates, as well as new records, must be emailed to
the appropriate database. Sequin does not support direct
submission of records over the Internet.
Additional configuration options are available under
the Misc menu. First, you can toggle between the stand-alone
and network aware modes of Sequin. The default mode
of Sequin, which is sufficient for most sequence submissions,
is stand-alone. In its network aware mode, Sequin can
exchange data with the NCBI and, for example, retrieve
sequences from Entrez and perform BLAST searches. The
network aware mode of Sequin is described in detail
in the Net Configure section,
below. Second, if you are running Sequin in its network
aware mode, you can query NCBI's Entrez database. Further
information about how to query Entrez is
available. Third, you can start the NCBI DeskTop.
The DeskTop, which is for advanced Sequin users only,
is described below.
[Top of Page]
Information from this form will be used as a citation
for the sequence entry itself. It can contain the same
information found in citations associated with the formal
publication of the sequence.
On the bottom of each form are two buttons. Click "Prev
form" (first page in a form) or "Prev page" (subsequent
pages in a form) to go to the previous form or page.
Click "Next Form" (last page on a form) or "Next Page"
(earlier pages on a form) to move to the next form or
page.
Form pages can also be saved individually by using
the "Export" function under the File menu. If you are
processing multiple submissions, you can use the "Import"
function under the File menu to paste previously entered
information directly on the page.
The Contact, Authors, and Affiliation pages can be
saved as a block so that you can use this information
for your next submission. For your first Sequin submission,
fill in the requested information on the Submitting
Authors form, and proceed with the preparation of the
submision. In the record viewer, when the submission
is basically finished, click on "Edit Submitter Info"
under the Edit menu. Under the file menu in the resulting
Submission Instructions form, click on Export Submitter
Info to save the information to a file. For subsequent
Sequin submissions, if you have already saved the submittor
information, click on Import Submitter Info under the
File menu on the Submission page of the Submitting Authors
form. You must still fill in the manuscript title on
the Submission page, though.
Please select one of the two radio buttons. If you
select "Yes," the entry will be released to the public
after the database staff has added it to the database.
If you select "No," fields will appear in which you
can indicate the date on which the sequences should
be released to the public. The submission will then
be held back by the database staff until formal publication
of the sequence or GenBank Accession Number, or until
the Release Date, whichever comes first.
Please enter a title which appropriately describes
the sequence entry. This is a title for the sequence
submission, and you may or may not want it to be the
same as the title of an article in which the sequence
is described. Later in the submission process, you will
have the opportunity to change this information and
add references from published or in press works which
describe the sequence. Please do not enter a name for
the sequence itself.
Please enter the name, telephone and fax numbers, and
e-mail address of the person who is submitting the sequence.
This is the person who will be contacted regarding the
sequence submission. This person does not have to be
on the list of authors involved in the sequencing. The
phone, fax, and Email address will not be visible in
the database record.
Please enter the names of the people who should receive
scientific credit for the generation of sequences in
this entry. The person on the Contact page is automatically
listed as the first author. This information can be
changed if necessary. Note that the first name of the
author is listed first. You can add as many authors
to this page as you wish. After you type in the name
of the third author, the box becomes a spreadsheet,
and you can scroll down to the next line by using the
thumb bar.
Please enter information about the principal institution
in which the sequencing and/or analysis were carried
out. If multiple labs were involved in the project,
this page should contain information about the workplace
of the senior author. This is not necessarily the same
as the workplace of the person described on the Contact
page. This information will show up in the reference
section of the record, with the title Direct Submission.
[Top of Page]
Use this form to indicate the type of sequence you
are submitting, as well as the format of the sequence.
In addition to being able to process single nuceotide
sequences, Sequin can also process sets of related sequences,
for example, segmented sequences and sequences from
phylogenetic, population, and mutation studies. Although
the sequences are handled as a single submission, each
sequence in a set will receive its own database accession
number and can be annotated independently.
Sequin, as well as
Entrez, a sequence, structure, and citation browser
available from the NCBI, are now both able to handle
and display aligned sets of closely related sequences.
These alignments will not be visible in the standard
GenBank, EMBL, or DDBJ database entries. At this time,
Sequin will accept sets of sequences from population,
phylogenetic, and mutation studies which are in either
FASTA, FASTA+GAP, PHYLIP, NEXUS contiguous, or NEXUS
interleaved format. If the sequences are in FASTA format,
Sequin will generate an alignment. If the sequences
have already been aligned in FASTA+GAP, PHYLIP or NEXUS,
Sequin will not change the alignment. Single sequences
must be in FASTA format; aligned, segmented sequences
can be in FASTA or FASTA+GAP format. FASTA format is
explained below.
Use the radio buttons to indicate which of the following
type of submissions you are creating:
- Single sequence: Select this option if you are submitting
a single sequence, such as a single mRNA or genomic
DNA sequence.
- Segmented sequence: A segmented set of nucleotide
sequences is a collection of non-overlapping sequences
which cover a specified genetic region. A standard
example is a set of genomic DNA sequences which encode
exons from a gene along with fragments of their flanking
introns. If the segmented set is part of an alignment,
however, select the appropriate Population, Phylogenetic,
or Mutation study button.
- Population study: Select this option if you are
submitting a set of sequences which make up a population
study, that is, if the sequences were derived by sequencing
the same gene from different isolates of a single
organism. If you want the sequences to be part of
an alignment, you can either import them in a pre-aligned
format, or ask Sequin to align them.
- Phylogenetic study: Select this option if you are
submitting a set of sequences which make up a phylogenetic
study, that is, if the sequences were derived by sequencing
the same gene from different organisms. If you want
the sequences to be part of an alignment, you can
either import them in a pre-aligned format, or ask
Sequin to align them.
- Mutation study: Select this option if you are submitting
a set of sequences which make up a mutation study,
that is, if the sequences were derived by sequencing
multiple mutations in a single gene. If you want the
sequences to be part of an alignment, you can either
import them in a pre-aligned format, or ask Sequin
to align them.
- Batch submission: Select this option if you are
submitting a set of unrelated sequences. Sequin will
not attempt to align the sequences.
Use the radio buttons to select one of the data formats.
If you are submitting a single or segmented sequence,
or a batch submission, your sequence must be in FASTA
format, described below. If you are submitting a set
of sequences as part of a population, phylogenetic,
or mutation study, you have a choice of sequence formats.
You may submit the set as individual sequences in FASTA
format. However, if your sequences are already aligned,
you can submit the sequences as part of an alignment.
Sequin currently accepts the alignment formats FASTA+GAP,
PHYLIP, NEXUS Interleaved, and NEXUS Contiguous. All
formats are described in the
Nucleotide Page , below.
[Top of Page]
This form is made up of three pages. On the first page,
the Organism page, you indicate the organism from which
the sequence derives. However, as explained below, in
the case of a set of sequences submitted as part of
a phylogenetic study, the organism is indicated either
in the sequence file itself or on the following
Source Modifiers form. The second page, the Nucleotide
page, prompts you to import the nucleotide sequence(s)
into Sequin from a separate computer file. The identity
of the third page changes depending on the submission
type indicated on the previous Sequence Format page.
If you are submitting a single or segmented sequence,
this page is a Protein page, which prompts you to import
an amino acid translation of the nucleotide sequence.
If you are submitting a population, phylogenetic, or
mutation study, this page is an Annotation page, which
allows you to add certain annotations to your nucleotide
sequence.
Information about the organism from which the sequence
was derived should be entered on this page. Alternatively,
for any type of submission, the name of the organism,
as well as additional information, can be indicated
instead in the file which contains the nucleotide sequence.
Indeed, if you are submitting a set of sequences for
a phylogenetic study, you will not be able to fill out
any information on the Organism page. Instead, you must
indicate the organism names in the sequence file or
on the following Source
Modifiers form. A detailed description of how to
format this organism information is presented in the
documentation for the Nucleotide
Page , below.
The scrollable list contains the scientific names of
many organisms. To reach a name on the list, type the
first few letters of the scientific name into the appropriate
field. The list will scroll to the appropriate place,
and you can select the organism. When you choose a name
from the list, the Scientific Name and the Genetic Code
for Translation fields are filled out automatically.
If there is a common name for the organism, the Common
Name field will be filled out as well. You can also
use the thumb bar to reach the appropriate part of the
list. If you have any questions about the scientific
or common name of an organism, see the NCBI
taxonomy browser
If the name of the organism is not on the list, type
it in directly. If you do not know the scientific name,
you can provide another species-level indication like
"Paramecium sp.", or "Unidentified green algae X457".
Additional information like subspecies, strain, isolates,
or serotype can be entered later in the submission process.
From the selection list, please enter the location
of the genome which contains your sequence. Most entries
will have a "Genomic" location. The following is a brief
description of the choices in this pop-up menu:
- Genomic: Sequence is located in a chromosome. This
category includes mitochondrial and chloroplast proteins
which are encoded by the nuclear genome.
- Chloroplast: Sequence is found in plant chloroplast
DNA.
- Kinetoplast: Sequence is found in the DNA of a trypanosome
kinetoplast.
- Mitochondrion: Sequence is found in mitochondrial
DNA.
- Macronuclear: Sequence is found in the macronucleus
of a ciliated unicellular organism.
- Extrachromosomal: Sequence is found in another extrachromosomal
element not listed here, such as a B chromosome or
an F factor.
- Plasmid: Sequence is on a bacterial plasmid.
- Transposon: Sequence is from a transposable element.
- Insertion sequence: Sequence is from an integrated
transposon.
- Cyanelle: Sequence is from an algae cyanelle.
- Proviral: Sequence is from an integrated viral chromosome.
[Top of Page]
If the submission type was Phylogenetic study, this
field will read "Default Genetic Code." Please use this
field to select the genetic code which should be used
to translate the nucleic acid sequence. The genetic
code for a eukaryotic organism is "Standard". If you
selected a scientific organism name from the scrollable
list described above, this field was filled out automatically.
If you encode the organism name directly in the first
line of the file which contains your sequence, Sequin
will fill out this field automatically after your sequence
is imported. However, if the organism is rare, that
is, it is not among the top 500 organisms represented
in GenBank, this field will not be filled out automatically,
and you must select the genetic code.
Listed here are the translation tables which can be
selected. For more information, and for the translation
tables themselves, see the NCBI taxonomy
page .
- Standard
- Vertebrate mitochondrial
- Yeast mitochondrial
- Mold mitochondrial, etc. This selection includes
mold, protozoan, and coelenterate mitochondria as
well as mycoplasma and spiroplasma.
- Invertebrate mitochondrial
- Ciliate nuclear, etc. This selection includes ciliate,
dasycladacean and hexamita nuclei
- Echinoderm mitochondrial
- Euploid nuclear
- Bacterial. This selection includes all eubacteria
and archaebacteria.
- Alternative yeast nuclear
- Ascidian mitochondrial
- Flatworm mitochondrial
- Blepharisma macronuclear
The nucleotide sequence(s) and associated descriptive
information are entered on this page. Sequin can also
interpret the name of the organism, strain, chromosome,
and many other modifiers which are encoded on the first
line of the file which contains the nucleotide sequence.
This section of the documentation describes how to format
this data.
If you are submitting sequences as part of a phylogenetic,
mutation, or population study, you may encode the organism,
strain, chromosome, and other modifiers in one of two
places. This information can be encoded in the file
which contains the sequence. Alternatively, you can
enter the information on the
Source Modifiers form which follows the Organism
and Sequences Form.
If you are submitting a set of aligned sequences, and
one of those sequences is already present in the GenBank/EMBL/DDBJ
database, you must mark that sequence so that it does
not receive a new accession number. Instead of supplying
that sequence with a new Sequence Identifier, give it
the identifier accU12345, where U12345 is the accession
number of the sequence.
A database sequence can represent one of several different
molecule types. Enter in the Molecule pop-up menu the
type of molecule that was sequenced.
- Genomic DNA: Sequence derived directly from the
DNA of an organism. Note: The DNA sequence of an rRNA
gene has this molecule type.
- Genomic RNA: Sequence derived directly from the
genomic RNA of certain organisms, such a viruses.
- Precursor RNA: An RNA transcript before it is processed
into mRNA, rRNA, tRNA, or other cellular RNA species.
- mRNA[cDNA]: A cDNA sequence derived from mRNA.
- Ribosomal RNA: A sequence derived from the RNA in
ribosomes, for example, the sequence of a cDNA derived
from rRNA.
- Transfer RNA: A sequence derived from the RNA in
a transfer RNA, for example, the sequence of a cDNA
derived from tRNA.
- Small nuclear RNA: A sequence derived from small
nuclear RNA, for example, the sequence of a cDNA derived
from snRNA.
- Small cytoplasmic RNA: A sequence derived from small
cytoplasmic RNA, for example, the sequence of a cDNA
derived from small cytoplasmic RNA.
- Other-Genetic [plasmid]: A sequence that is not
normal genetic material but that is also is not a
transcription product. Examples include plasmids,
B chromosomes, and F factors.
[Top of Page]
Please choose the topology of the molecule, either
Linear or Circular, from the pop-up menu. Most sequences
have linear topology. Select Circular if the sequence
is complete and it has a circular topology, for example,
it is a plasmid or a complete mitochondrial genome.
If the sequence is incomplete at the 5' or 3' end,
please check the appropriate box. If a complete sequence
is entered, for example, the complete coding sequence
of a gene, do not check either box.
This box will not be visible if you selected FASTA+GAP,
PHYLIP NEXUS Interleaved, or NEXUS Contiguous format
on the Sequence Format form.
We suggest that for standard simple submissions you
follow the instructions below for how to create a nucleotide
sequence in FASTA format. In FASTA format, the line
preceding the lines of sequence consists of a ">" sign,
followed by some descriptive information. If you follow
the instructions, and the line immediately above your
sequence reads
>SeqID [org=organism scientific name] title
check this box. The unique sequence identifier for
the sequence will be the word which immediately follows
the ">".
If you have not included a SeqID, leave this box unchecked,
and Sequin itself will assign a unique sequence identifier.
However, be sure that the first line of descriptive
information starts with a ">".
The sequence(s) which you will be submitting should
be located in another file on your computer; you cannot
directly type sequence data into this page. The sequence(s)
must be in a certain format, called the FASTA format.
Each line of sequence should be no longer than 80 characters.
Note: If you are submitting multiple sequences as part
of a phylogenetic, population, or mutation study, each
sequence must be in FASTA format. However, it does not
matter if the sequences are in one file or separate
files on your computer. You can encode information about
the sequence, such as the organism, chromosome, or strain,
in the file that contains the sequence, as described
below. Alternatively, this information can be added
on the Source Modifiers
form which follows the Organism and Sequences Form.
The line directly above the sequence (the first line
in the file, for a single sequence) should read
>SeqID [org=organism scientific name] [modifier=modifier
name] title
for example,
>DNA.new [org=Homo sapiens] [chromosome=17] [map=17q21]
Human breast and ovarian cancer susceptibility (BRCA1) mRNA, complete cds.
- >: The ">" sign must precede any descriptive information
about your sequence. Any line which does not begin
with a ">" sign will be interpreted as a line of nucleotide
sequence.
- SeqID: A unique sequence identifier which you give
your sequence. This field is required only if you
have checked the box on the Nucleotide page entitled
"FASTA def line starts with sequence ID." This name
must be different from the name which you give to
other nucleotide or protein sequence(s). If you do
not include an identifier here, Sequin will create
one for you. The identifier will be changed to an
accession number by the database staff later in the
submission process. If you are submitting a set of
aligned sequences, and one of those sequences is already
present in the GenBank/EMBL/DDBJ database, you must
mark that sequence so that it does not receive a new
accession number. Instead of supplying that sequence
with a new SeqId, give it the identifier accU12345,
where U12345 is the accession number of the sequence.
- [org=organism scientific name]: This field gives
you the opportunity to indicate the name of the organism
directly on the sequence file. You must indicate the
name of the organism here if you are submitting sequences
as part of a phylogenetic study. For other types of
submissions, you may indicate the organism name either
with this field or by filling out the previous Organism
page. You must enter the complete scientific name
(no abbreviations). The field must be written as shown,
complete with brackets. Do not put spaces around the
"=". Some common examples include [org=Homo sapiens],
[org=Mus musculus], [org=Saccharomyces cerevisiae],
and [org=Drosophila melanogaster]. The NCBI maintains
a
taxonomy database with information about scientific
and common organism names
- [modifier=modifier name]: Additional modifiers can
also be encoded on the first line of the nucleotide
sequence. These modifiers include chromosome, map,
clone, subclone, haplotype, genotype, sex, cell-line,
cell-type, tissue-type, clone-lib, dev-stage, frequency,
germline, rearranged, lab-host, pop-variant, tissue-lib,
plasmid-name, transposon-name, ins-sequence-name,
plastid-name, strain, substrain, type, subtype, variety,
serotype, serogroup, serovar, cultivar, pathovar,
chemovar, biovar, biotype, group, subgroup, isolate,
common, acronym, dosage, natural host, , and sub-species.
Complete descriptions of these modifiers can be found
in the Source and
Organism subpages of the Biological Source Modifiers
page. You may include as many modifiers as you like,
but each must be bounded by a set of brackets. The
name of the modifier must be written exactly as shown
in the list above. An example of a string of modifiers
is [strain=BALB/c] [chromosome=5] [sex=male] [tissue-type=testis].
Modifiers can also be added as
Biological Source descriptors or features later
in the submission process.
- [lineage=lineage]: If you are working with an organism
whose lineage is not listed in the NCBI taxonomy database,
you can provide the complete lineage here.
- title: A definition or description of the sequence.
It is important to choose the title carefully, as
it will become the Definition line of the entry. This
line is the brief description of the sequence that
appears in the output of many molecular biology analysis
programs, such as the BLAST program. If you are submitting
a set of sequences from a phylogenetic, population,
or mutation study, you can leave this field blank.
You can instead add the sequence titles on the
Annotation page, below. Sequin will also create
titles automatically, using the Generate Definition
Line function under the Annotate
menu. Titles can be edited later in the submission
process by selecting Descriptors-->Title, under the
Annotate menu in the record viewer, described
below. .
[Top of Page]
Nucleotide definition lines, or titles, follow a structured
format:
Genus species Protein name (gene name) mRNA/gene, [one
of 4 from below], complete/partial cds
nuclear gene encoding mitochondrial protein
nuclear gene encoding chloroplast protein
mitochondrial gene encoding mitochondrial protein
chloroplast gene encoding chloroplast protein
Use the name in the format of Genus species, unless
the organism is Human. Choose mRNA or gene depending
whether you have sequenced mRNA or genomic DNA. Choose
complete or partial cds depending whether the sequence
is complete or partial.
However, the general format does not cover all possible
Definition lines, as shown in the following examples:
- Human breast and ovarian cancer susceptibility (BRCA1)
mRNA, complete cds.
Human breast and ovarian cancer susceptibility (BRCA1) gene, exon 4.
Gallus Gallus red-sensitive pigment mRNA, complete cds.
Bos Taurus retinal pigment (RPE1) mRNA, 3' end.
Saccharomyces cerevisiae cystathionine gamma-lyase (CYS3) gene, complete cds.
Arabidopsis thaliana pyruvate dehydrogenase E1 alpha subunit mRNA,
nuclear gene encoding mitochondrial protein, complete cds.
Rattus norvegicus fos-related antigen 2 (fra-2) mRNA, complete cds.
Human Down syndrome region, chromosome 13, genomic sequence.
Mus musculus GGT trinucleotide repeat, chromosome 1, genomic sequence.
For rRNA, things are a bit simpler, and you only need
to have:
Genus species (optional: isolate #) 16S mitochondrial
ribosomal RNA, large/small subunit, mitochondrial gene.
For example:
Ophraella conferta isolate 62 16S mitochondrial ribosomal RNA, large subunit, mitochondrial gene.
>DNA.new [org=Homo sapiens] [chromosome=17] [map=17q21] Human breast and ovarian cancer susceptibility (BRCA1) mRNA, complete cds.
A number of programs output sets of aligned sequences
in FASTA format. Frequently, in order to align these
sequences, gaps must be inserted. You cannot submit
gapped sequences in standard FASTA format. In FASTA+GAP
format, gaps can be indicated by a "-". Each sequence,
including gaps, must be the same length. The gaps will
only show up in the alignment, not in the individual
sequence in the database.
Sequences in FASTA+GAP format resemble FASTA sequences.
The previous section on
FASTA format for nucleotide sequences has instructions
for formatting FASTA sequences. All sequences in FASTA+GAP
format should be in the same file.
The following is an example of FASTA+GAP format:
>A-0V-1-A [org=Gallus gallus] [strain=C]
TCACTCTTTGGCAACGACCCGTCGTCATAATAAAGATAGAGGGGCAACTAAAGGAAGCTCTA
TTAGATACAGGAGCAGATGATACAGTATTAGAAGAAAT
>A-0V-2-A [org=Drosophila melanogaster] [strain=D]
TCACTCTTTGGCAAC---GCGTCGTCACAATAAAGATAGAGGGGCAACTAAAGGAAGCTCTA
TTAGATACAGGAGCAGATGATACAGTATTAGAAGAAAT
>A-0V-3-A [org=Caenorhabditis elegans] [strain=E]
TCACTCTTTGGCAAC---GCGTCGTCACAATAAAGATAGAGGGGCAACTAAAGGAAGCTCTA
TTAGATACAGGAGCAGATGATACAGTATTAGAAGAAAT
>A-0V-4-A [org=Rattus norvegicus] [strain=F]
TCACTCTTTGGCAACGACCCGTCGTCACAATAAAGATAGAGGGGCAACTAAAGGAAGCTCTA
TTAGATACAGGAGCAGATGATACAGTATTAGAAGAAAT
>A-0V-7-A [org=Aspergillus nidulans] [strain=G]
TCACTCTTTGGCAACGACCAGTCGTCACAATAAAGATAGAGGGGCAACTAAAGGAAGCTCTA
TTAGATACAGGAGCAGATGATACAGTATTAGAAGAAAT
[Top of Page]
A number of programs output sets of aligned sequences
in PHYLIP format.
The following is an example of PHYLIP format.
5 100
A-0V-1-A TCACTCTTTG GCAACGACCC GTCGTCATAA TAAAGATAGA GGGGCAACTA
A-0V-2-A TCACTCTTTG GCAAC---GC GTCGTCACAA TAAAGATAGA GGGGCAACTA
A-0V-3-A TCACTCTTTG GCAAC---GC GTCGTCACAA TAAAGATAGA GGGGCAACTA
A-0V-4-A TCACTCTTTG GCAACGACCC GTCGTCACAA TAAAGATAGA GGGGCAACTA
A-0V-7-A TCACTCTTTG GCAACGACCA GTCGTCACAA TAAAGATAGA GGGGCAACTA
AAGGAAGCTC TATTAGATAC AGGAGCAGAT GATACAGTAT TAGAAGAAAT
AAGGAAGCTC TATTAGATAC AGGAGCAGAT GATACAGTAT TAGAAGAAAT
AAGGAAGCTC TATTAGATAC AGGAGCAGAT GATACAGTAT TAGAAGAAAT
AAGGAAGCTC TATTAGATAC AGGAGCAGAT GATACAGTAT TAGAAGAAAT
AAGGAAGCTC TATTAGATAC AGGAGCAGAT GATACAGTAT TAGAAGAAAT
In this example, the first line indicates that there
are 5 sequences, each with 100 nt of sequence. The following
five lines contain the Sequence IDs, followed by the
sequences. Specifically, the sequence identifier for
the first sequence is A-0V-1-A. Note that subsequent
blocks of sequence do not contain the Sequence ID.
If you wish, you can modify this format slightly so
that Sequin can determine the correct organism, and
any other modifiers, for each sequence. An example of
such modifications are below in the section on
Source Modifiers for PHYLIP and NEXUS .
Alternatively, you can leave your sequence alignment
in standard PHYLIP format and enter the organism, strain,
chromosome, etc. information on the following
Source Modifers form .
A number of programs output sets of aligned sequences
in one of two NEXUS formats, NEXUS Interleaved and NEXUS
Contiguous.
The following is an example of NEXUS Interleaved format.
#NEXUS
[This data assembled using Sequencher*, from Gene Codes Corporation.]
begin data;
dimensions ntax=5 nchar=100;
format datatype=dna gap=: interleave;
matrix
A-0V-1-A TCACTCTTTG GCAACGACCC GTCGTCATAA TAAAGATAGA GGGGCAACTA
A-0V-2-A TCACTCTTTG GCAAC---GC GTCGTCACAA TAAAGATAGA GGGGCAACTA
A-0V-3-A TCACTCTTTG GCAAC---GC GTCGTCACAA TAAAGATAGA GGGGCAACTA
A-0V-4-A TCACTCTTTG GCAACGACCC GTCGTCACAA TAAAGATAGA GGGGCAACTA
A-0V-7-A TCACTCTTTG GCAACGACCA GTCGTCACAA TAAAGATAGA GGGGCAACTA
A-0V-1-A AAGGAAGCTC TATTAGATAC AGGAGCAGAT GATACAGTAT TAGAAGAAAT
A-0V-2-A AAGGAAGCTC TATTAGATAC AGGAGCAGAT GATACAGTAT TAGAAGAAAT
A-0V-3-A AAGGAAGCTC TATTAGATAC AGGAGCAGAT GATACAGTAT TAGAAGAAAT
A-0V-4-A AAGGAAGCTC TATTAGATAC AGGAGCAGAT GATACAGTAT TAGAAGAAAT
A-0V-7-A AAGGAAGCTC TATTAGATAC AGGAGCAGAT GATACAGTAT TAGAAGAAAT
In this example, the first few lines provide information
about the data in the sequence alignment. The following
five lines contain the Sequence IDs, followed by the
sequences. Specifically, the sequence identifier for
the first sequence is A-0V-1-A. Note that subsequent
blocks of sequence also contain the Sequence ID.
If you wish, you can modify this format slightly so
that Sequin can determine the correct organism, and
any other modifiers, for each sequence. An example of
such modifications are below in the section on
Source Modifiers for PHYLIP and NEXUS . Alternatively,
you can leave your sequence alignment in standard PHYLIP
format and enter the organism, strain, chromosome, etc.
information on the following
Source Modifers form .
The following is an example of NEXUS Contiguous format.
#NEXUS
BEGIN DATA;
DIMENSIONS NTAX=5 NCHAR=100;
FORMAT MISSING=? GAP=- DATAtype=DNA ;
MATRIX
A-0V-1-A
TCACTCTTTGGCAACGACCCGTCGTCATAATAAAGATAGAGGGGCAACTAAAGGAAGCTCTA
TTAGATACAGGAGCAGATGATACAGTATTAGAAGAAAT
A-0V-2-A
TCACTCTTTGGCAAC---GCGTCGTCACAATAAAGATAGAGGGGCAACTAAAGGAAGCTCTA
TTAGATACAGGAGCAGATGATACAGTATTAGAAGAAAT
A-0V-3-A
TCACTCTTTGGCAAC---GCGTCGTCACAATAAAGATAGAGGGGCAACTAAAGGAAGCTCTA
TTAGATACAGGAGCAGATGATACAGTATTAGAAGAAAT
A-0V-4-A
TCACTCTTTGGCAACGACCCGTCGTCACAATAAAGATAGAGGGGCAACTAAAGGAAGCTCTA
TTAGATACAGGAGCAGATGATACAGTATTAGAAGAAAT
A-0V-7-A
TCACTCTTTGGCAACGACCAGTCGTCACAATAAAGATAGAGGGGCAACTAAAGGAAGCTCTA
TTAGATACAGGAGCAGATGATACAGTATTAGAAGAAAT
In this example, the first few lines provide information
about the data in the sequence alignment. The following
five lines contain the Sequence IDs, followed by the
sequences. Specifically, the sequence identifier for
the first sequence is A-0V-1-A. Note that subsequent
blocks of sequence also contain the Sequence ID.
If you wish, you can modify this format slightly so
that Sequin can determine the correct organism, and
any other modifiers, for each sequence. An example of
such modifications are below in the section on
Source Modifiers for PHYLIP and NEXUS . Alternatively,
you can leave your sequence alignment in standard PHYLIP
format and enter the organism, strain, chromosome, etc.
information on the following
Source Modifers form .
[Top of Page]
If you wish, you can modify the PHYLIP or NEXUS formats
so that Sequin can determine the correct organism, and
any other modifiers, for each sequence. The modifications
in this case consist of the addition of lines at the
end of the file after the sequence. The first line applies
to the first sequence, the second line to the second
sequence, and so on. You must have one line for each
sequence. These inserted lines resemble the line that
immediately precedes the sequence in a FASTA file. The
major difference is that these lines should not begin
with a Sequence ID. Instead, the local Sequence ID for
Sequin is the name to the left of the first line of
sequence.
Each of the initial lines starts with the character
">". The scientific organism name follows in brackets.
Optional modifiers also follow in brackets. You can
add individual sequence titles on these lines, or you
can add the same title to all sequences on the Annotation
page. For further information on the data that can go
in the lines preceding the sequences, see the instructions
entitled "FASTA format for nucleotide sequences",
above. For instructions on formatting a sequence
title, see Nucleotide Definition line (title),
above.
The following lines indicating the organims and strain
of each sequence would follow immediately after the
sequence in the PHYLIP and NEXUS examples, above.
>[org=Gallus gallus] [strain=C]
>[org=Drosophila melanogaster] [strain=D]
>[org=Caenorhabditis elegans] [strain=E]
>[org=Rattus norvegicus] [strain=F]
>[org=Aspergillus nidulans] [strain=G]
Alternatively, you can leave your sequence alignment
in standard PHYLIP format and enter the organism, strain,
chromosome, etc. information on the following
Source Modifers form .
Sequin can also read segmented sets which are part
of an alignment if the sequences are in FASTA or FASTA+GAP
format. Each segment should have its own sequence identifier
(the tern immediately following the ">", but organism
name and source modifiers should only be indicated for
the first segment from each sequence. Square brackets
are used to delimit the members of a set. For example,
[
>A-0V-1-Apart1 [org=Gallus gallus] [strain=C]
TCACTCTTTGGCAAC
>A-0V-1-Apart2
GACCCGTCGTCATAATAAAGATAGAGGGGCAACTAAAGGAAGCTCTA
TTAGATACAGGAGCAGATGATACAGTATTAGAAGAAAT
]
[
>A-0V-2-Apart1 [org=Drosophila melanogaster] [strain=D]
TCACTCTTTGGCAAC
>A-0V-2-Apart2
GAAGCGTCGTCACAATAAAGATAGAGGGGCAACTAAAGGAAGCTCTA
TTAGATACAGGAGCAGATGATACAGTATTAGAAGAAAT
]
If the sequence is in FASTA+GAP format, Sequin will
keep the alignment provided. However, if the sequence
is in FASTA format, you must click on the Create Alignment
button in order to make Sequin generate the alignment.
[Top of Page]
After your sequence is in the appropriate FASTA or
PHYLIP format, click on "Import Nucleotide FASTA" or
"Import Nucleotide PHYLIP". A new window will open showing
available directories and files. Select the file containing
your sequence and click OK. The sequence will be imported
automatically. If you have imported the wrong sequence,
select Clear under the Edit menu to remove the sequence.
After you import your sequence, a box will appear with
information about the sequence. The first line will
describe the number of nucleotide segments imported,
and the total length in nucleotides of the sequence.
Each segment is numbered, and its length, unique identifier
(SeqID) title (Definition line) are listed. If any of
this information is missing, check the file containing
the sequence and re-import the sequence.
Sets of FASTA-formatted nucleotide sequences can be
imported into Sequin in one of two ways. If all the
sequences are in the same file, import the file by clicking
on "Import Nucleotide FASTA." If the sequences are in
separate files, import them sequentially by clicking
on "Import Nucleotide FASTA." In either case, the line
immediately preceding each sequence must follow the
FASTA format described above.
Note: This page is for additing protein sequence to
a single or segmented sequence. If you submitted a set
of nucleotide sequences from a population, phylogenetic,
or mutation study, this page will be instead called
Annotation Page , and
is described below.
This page allows you to provide the optional protein
sequence translation of the nucleotide sequence which
you just entered. If the nucleotide sequence is alternatively
spliced or contains multiple open reading frames, enter
all of the protein sequences on this page. Each protein
sequence will appear in the database record as a coding
sequence (CDS) feature. Sequin will automatically determine
which nucleotide sequences code for the protein, and
indicate the nucleotide sequence interval on the database
record. Sequin also provides tools which allow you to
view a graphical representation of all the open reading
frames in your nucleotide sequence, and to convert these
reading frames into CDS features. These tools are described
later in the help documentation under the
ORF Finder.
Most protein entries are computer-generated conceptual
translations of a nucleic acid sequence. If you have
confirmed this translation by direct sequencing either
of the entire protein or of peptides derived from the
protein, please check this box.
If the sequence is lacking amino acids at the amino-
or carboxy-terminal end of the protein, please check
the appropriate box. If the amino acid sequence represents
the entire coding region of a protein, do not check
either box.
We suggest that for standard simple submissions you
follow the instructions below for how to create a protein
sequence in FASTA format. In FASTA format, the line
preceding the lines of sequence consists of a ">" sign,
followed by some descriptive information. If you follow
the instructions, and the line immediately above your
sequence reads
>SeqID [gene=locus;description] [prot=name;description]
title
check this box. The unique sequence identifier for
the sequence will be the word following the ">" sign.
If you have not included a SeqID, leave this box unchecked,
and Sequin itself will assign a unique sequence identifier.
However, be sure that the first line of descriptive
information starts with a ">".
[Top of Page]
If you check this box, Sequin will make an mRNA feature
with the same initial intervals (i.e., range of sequence)
as the CDS feature. After the record has been assembled,
you should edit the mRNA feature location to add the
5' UTR and 3' UTR intervals. This may be done either
in the mRNA editor or in the sequence editor.
This amino acid sequence should be located in another
file on your computer; you cannot directly type sequence
data into this form. It must be in a certain format,
called the FASTA format. If you are submitting multiple
sequences, each one must be in FASTA format. Each line
of sequence should be no longer than 80 characters.
Remove any symbols for stop codons, such as "Z" or "*",
from your sequence before importing it into Sequin.
The line directly above the sequence (the first line
in the file, for a single sequence) should read:
>SeqID [gene=locus;description] [prot=name;description]
[comment=text] title
for example,
>Prot.new [gene=BRCA1] [prot=Breast and ovarian cancer susceptibility protein]
Human breast and ovarian cancer susceptibility (BRCA1) protein, complete sequence.
- >: The ">" sign must precede any descriptive information
about your sequence. Any line which does not begin
with a ">" sign will be interpreted as a line of protein
sequence.
- SeqID: A unique sequence identifier which you give
your sequence. This field is required only if you
have checked the box on the Protein page entitled
"FASTA def line starts with sequence ID." This name
must be different from the name which you give to
other nucleotide or protein sequence(s). If you do
not include an identifier here, Sequin will create
one for you. The identifier will be changed to an
accession number by the database staff later in the
submission process.
- [gene=gene name]: Enter [gene=gene name]. Do not
put spaces around the "=". The brackets are required.
An example is [gene=eIF4E].
- [prot=protein name]: Enter [prot=protein name].
Do not put spaces around the "=". The brackets are
required. An example is [prot=eukaryotic initiation
factor 4E-I].
- [comment=text]: Enter [comment=text]. Do not put
spaces around the "=". This field is optional. Any
text that is entered will become a /note on the CDS
feature (protein sequence).
- title: A definition or description of the sequence.
It is important to choose the title carefully, as
it will become the Definition line of the protein
sequence. This line is the brief description of the
sequence that appears in the output of many molecular
biology analysis programs, such as the BLAST program.
If you do not enter a title, Sequin will create one
based on information you have provided. The database
staff will amend it if necessary.
The protein name should be included in the entry; all
other fields are optional. If you do not supply a title,
Sequin will generate one from the other information
you have provided.
Protein definition lines, or titles, follow a structured
format:
Genus species Protein name (gene name) protein, [one
of 4 from below], complete/partial sequence.
mitochondrial protein encoded by nuclear gene
chloroplast protein encoded by nuclear gene
mitochondrial protein encoded by mitochondrial gene
chloroplast protein encoded by chloroplast gene
Use the name in the format of Genus species, unless
the organism is Human. Choose complete or partial depending
whether the sequence is complete or partial.
However, the general format does not cover all possible
Definition lines, as shown in the following examples:
- Human breast and ovarian cancer susceptibility (BRCA1)
protein, complete sequence.
Human breast and ovarian cancer susceptibility (BRCA1) protein, translation of exon 4.
Gallus Gallus red-sensitive pigment protein, complete sequence.
Bos Taurus retinal pigment (RPE1) protein, carboxyl terminus.
Saccharomyces cerevisiae cystathionine gamma-lyase (CYS3) protein, complete sequence.
Arabidopsis thaliana pyruvate dehydrogenase E1 alpha subunit protein,
mitochondrial protein encoded by nuclear gene, complete sequence.
Rattus norvegicus fos-related antigen 2 (fra-2) protein, complete sequence.
Human Down syndrome region, chromosome 13, translation of putative ORF in genomic sequence.
Caenorhabditis elegans cosmid B0303, translation of putative ORF, similar to adenylate cyclase (SP:P08678).
[Top of Page]
>Prot.new [gene=BRCA1] [prot=Breast and ovarian cancer susceptibility protein]
Human breast and ovarian cancer susceptibility (BRCA1) protein, complete sequence.
Click on "Import Protein FASTA." A new window will
open showing available directories and files. Once a
filename is selected, click OK. The sequence will be
imported automatically. If you have imported the wrong
sequence, select Clear under the Edit menu to remove
the sequence.
After you import your sequence, a box will appear with
information about the sequence. The first line will
describe the number of protein sequences imported and
the total length in amino acids of the sequence. Each
sequence is numbered, and its length, unique identifier
(SeqID), Gene name, Protein name, and title (Definition
line) are listed. If any of this information is missing,
check the file containing the sequence and re-import
the sequence.
You may want to import a non-contiguous set of protein
sequences into Sequin if, for example, you are submitting
a nucleotide sequence with multiple open reading frames.
Sets of protein sequences can be imported into Sequin
in one of two ways. If all the sequences are in the
same file, import the file by clicking on "Import Protein
FASTA." If the sequences are in separate files, import
them sequentially by clicking on "Import Protein FASTA."
In either case, the line immediately preceding each
sequence must follow the FASTA format described above.
After you import protein sequence(s), a new window
will appear in which you can edit information about
the protein sequence. If you did not enter a unique
identifier (SeqID) for the sequence, Sequin generated
one automatically. The SeqID and protein name should
be filled out, but the protein name can also be modified
later in the submission process. Entering other information
is optional.
Note: This page is for adding annotations to sets of
sequences from phylogenetic, population, and mutation
studies. If you submitted a single or segmented sequence,
this page will be instead called
Protein page , and is described above.
The radio buttons at the top of the page allow you
to choose which features to add to your sequences. Any
annotation you add on this page will be propagated to
ALL sequences in the set. If you want to add annotations
only to selected sequences, you must add them manually
later in the submission process. You may only select
one of these buttons.
- None: Select none if you do not want to add a rRNA
or CDS feature to all sequences in the set.
- rRNA: Select rRNA if you want to add an rRNA feature
to all sequences in the set. This rRNA will span the
entire sequence, from the first nucleotide to the
last. Once you mark the radio button, additional input
lines will be displayed on the form. If the sequence
does not encode a complete rRNA molecule, check Incomplete
at 5' end or Incomplete at 3' end, or both. Enter
the name of the rRNA, such as 16s rRNA. If there is
a gene symbol for the rRNA, enter it as well. The
comment box allows you to enter any additional comments
about the rRNA.
- CDS: Select CDS if you want to add a coding sequence
(amino acid translation) to all the sequences in the
set. Sequin will automatically determine the CDS by
selecting the longest open reading frame in the sequences.
If your sequence contains multiple open reading frames,
or if the desired open reading frame is not the longest,
you can edit the CDS feature later in the submission
process by using the
Coding Region feature form. Once you mark the
radio button, additional input lines will be displayed
on the form. We encourage you to fill in the protein
and gene name fields. Entering information in the
comment box is optional.
[Top of Page]
This box allows you to add a title to each sequence
in the set. Note that the identical title will be added
to all sequences. You do not need to add a title here
if you already included one in the definition line of
your sequence. A detailed
description of the title formats preferred by the
databases is provided on the Nucleotide page, above.
Titles should always start with the name of the organism.
If all the sequences are from the same organism, incorporate
the organism name directly into the title. However,
if you are submitting sequences which derive from different
organisms, for example, a phylogenetic study, do not
include the organism names in the title. Instead, check
the box marked "Prefix title with organism name", and
Sequin will add the appropriate organism name to the
title for each sequence.
Sequin will also create titles automatically, using
the Generate Definition Line function under the
Annotate menu. In addition, titles can be edited
later in the submission process by selecting Descriptor-->Title,
under the Annotate menu in the record viewer, described
below.
You will only see this form if you are submitting sequences
as part of a phylogenetic, mutation, or population study.
This form allows you to add or modify necessary information,
such as the organism, and optional information, such
as strain or chromosome, to your sequences. From the
top pop-up menu, choose the modifier you want to annotate.
The left column lists the sequences by their SeqID,
or the unique identifier which you (or Sequin) provided
for your sequence. Type the modifier for each sequence
in the corresponding box labelled Value. For example,
if you select the Organism modifier, you might type
Mus musculus in the first Value box, Homo sapiens in
the second, etc.
If you have not already supplied the scientific name
of the organism, enter it on this form.Do not use abbreviations.
Complete descriptions of these modifiers can be found
in the Source and
Organism subpages of the Biological Source Modifiers
page. You may add as many modifiers as you wish.
After you finish the Organism and Sequences Form, Sequin
will process your entry based on the information you
have entered. The window you see now is called the record
viewer. This is also the window you will see if you
are submitting an update to an existing record. The
instructions after this point are the same whether you
are submitting a new record or an update.
In the default window of the record viewer, you will
see your entry approximately as it will appear in the
database. Most of the information which you entered
earlier in the submission process is present in the
viewer; other information, such as the contact, is still
present in the record but will not be visible in the
database entry. If you have provided a conceptual translation
of the nucleotide sequence, the translation will be
listed as a CDS Feature. Sequin automatically determines
which nucleotides encode for the protein, and lists
them, even if the nucleotide sequence contains introns
and exons.
You can save the entry to a file by selecting Save
or Save As under the File menu. This is not the same
as saving the entry for submission to the database.
It is a good idea to save the file at this point so
that if you make any unwanted changes during the editing
process you can revert to the original copy. If you
wish to edit the entry later, click on "Read existing
Submission" on the Welcome to Sequin form and choose
the file.
It is likely that the entry could be processed now
for submission to the database. However, you may wish
to add additional information to the entry. This information
may be in the form of Descriptors or Features. In general,
Descriptors are annotations which apply to an entire
sequence, or an entire set of sequences, and Features
are annotations that apply to a specific sequence interval.
For example, you may want to change the Reference Descriptor
to add a published manuscript, or to annotate the sequence
by adding features such as a signal peptide or poly
A signal.
Information in the record viewer can be edited in different
ways. One way to add or modify information is to double
click within the block of information you wish to edit.
Many blocks, such as "Definition," "Keywords," "Source,"
"Reference," or "Features" can be edited. For example,
if you wish to add another reference for the sequence,
double click on "Reference" to access the appropriate
form.
A second way to add or modify information is to create
a new descriptor or feature by selecting the appropriate
form from the Misc or Features menus. These options
are described later in this help document.
Finally, you may need to edit the sequence itself.
Instructions for working
with the sequence are presented in the documentation
for the Sequence Editor.
[Top of Page]
Once you are satisfied that you have added all the
appropriate information, you must process your entry
for submission to the database. Select "Validate" under
the Search menu. This function detects discrepancies
between the format of your submission and that required
by the database selected for entry.
If Sequin detects problems with the format of your
record, you will see a screen listing the validation
errors as well as suggestions for how to fix the discrepancies.
If you double click on the error message, a new form
will appear on which you can enter information to correct
the problem. You can also dismiss the suggestion and
proceed on your own. When you think you have corrected
all the problems, click on "Revalidate."
Message: Select Verbose or Terse. Verbose gives a more
detailed explanation of the problem.
Filter: Select the error message(s) you wish to see.
Severity: Select the types of error messages you wish
to see. You will see the type of message selected, as
well as any messages warning of more serious problems.
There are three types of error messages, Warning, Error,
and Fatal. Warning is the least severe, and Fatal is
the most severe. You may submit the record even if it
does contain errors. However, we encourage you to fix
as many problems as possible. Note that some messages
may be merely suggestions, not discrepancies. A possible
Warning message is that a splice site does not match
the consensus. This may be a legitimate result, but
you may wish to recheck the sequence. A possible Error
message is that the conceptual translation of the sequence
which you supplied does not encode an open reading frame.
In this case, you might want to check that you translated
the sequence in the correct reading frame. A possible
Fatal message is that you neglected to include the name
of the organism from which the sequence derives. The
name of the organism is absolutely required for a database
entry.
If Sequin does not detect any problems with the format
your record, you will see a message that "Validation
test succeeded." Click the "Done" button on the submission
viewer, or select "Prepare Submission" under the File
menu. You will be prompted to save the file. E-mail
this file to the database at the address shown. You
MUST e-mail the file; Sequin does not submit the file
automatically over the network. The e-mail addresses
for the databases are:
- GenBank: gb-sub@ncbi.nlm.nih.gov -EMBL: datasubs@ebi.ac.uk
-DDBJ: ddbjsub@ddbj.nig.ac.jp
After your entry is complete, close the record viewer.
You will be returned to the Welcome to Sequin form,
and can begin another entry.
This pop-up menu shows a list of SeqIDs of all nucleotide
and protein sequences associated with the Sequin entry.
Use the menu to select the sequences displayed in the
record viewer, as well as the sequences you want to
"target," that is, the sequences you want a descriptor
to apply to (see Descriptors
in the Sequin help documentation). You may select
either an individual sequence by name or a set of sequences,
such as All Sequences, or SEG_dna if you have a segmented
nucleotide set. You may change the selection at any
time.
You may change the format of the record viewer to fit
your needs. The formats are described below. Many of
the display formats can be exported (by selecting Export
under the File menu) and opened in a text editor. Edit
fields by double clicking within a block of information.
A new form will appear which will prompt you for information.
Editing a field in one display format will change that
field in all formats. Although some fields can only
be edited in a selected display format, most can also
be edited by selecting the appropriate option from the
Misc or Features menus at the top of the Sequin window
(described below).
[Top of Page]
This display format shows the entry in a graphical
summary format. It is similar to the view shown in the
Graphic viewer, except that lines are not labeled. The
top bar represents the nucleotide sequence. Lines represent
different features on the sequence, such as a CDS (coding
sequence) or additional sequences, if you have a set
of sequences or have performed a PowerBLAST search.
Double click on an arrow or bar to launch an editing
window. If you have performed a PowerBLAST search, double
clicking on a sequence will launch the Entrez viewer
for that sequence.
This display format shows the entry in a graphical
view. The top bar represents the nucleotide sequence.
Lower arrows or bars represent different features on
the sequence such as a CDS (coding sequence) or additional
sequences, if you have a set of sequences or have performed
a PowerBLAST search. Double click on an arrow or bar
to launch an editing window. If you have performed a
PowerBLAST search, double clicking on a sequence will
launch the Entrez viewer for that sequence. Any sequence
highlighted in the Sequence Editor will be boxed on
the graphical view of the sequence. In order to see
a graphical representation of a segmented set (see
Submission type , above), the Target Sequence must
be set to SEG_dna.
The Style pop-up menu allows you to see the display
in different styles and colors. The default is System.
The Scale pop-up menu allows you to see the display
in different sizes. The smaller the number, the larger
the display.
This display format shows sets of aligned sequences,
such as those imported as part of a population, phylogenetic,
or mutation set, when the Target Sequence pop-up is
set to All Sequences. Each sequence is shown as a bar.
Differences between the sequences are shown as red vertical
lines. To launch the viewer for an individual sequence,
double click on the bar representing the sequence. To
lauch the alignment editor, and see the alignment of
all sequences, set the Target Sequence to All Sequences,
the Display Format to Alignment, highlight the alignment
by clicking on the box which surrounds the sequences,
and select Edit Alignment from the Sequin Edit menu.
This display format shows the nucleotide sequence(s)
in the record along with any annotated features (such
as CDS or mRNA). The display changes depending on what
options are selected. Use the Sequences pop-up menu
to choose the nucleotide sequences you want to display.
If there are multiple sequences in the record, select
Aligned to see all sequences. The entire sequence of
the "master" sequence will be shown. Other sequences
will appear as dots where they are identical to the
master, and letters where they are different. If the
multiple sequences are the result of a PowerBLAST search,
the "master" sequence will be that against which the
search was performed. If the multiple sequences were
imported into Sequin as part of a phylogenetic, population,
or mutation study, the "master" sequence can be changed
by selecting different sequences in the Target Sequence
pop-up. You can use the Features pop-up menu to change
the display of the features. You can choose whether
you want features displayed for the sequence selected
in the Target Sequene pop-up, for all aligned sequences,
or for no sequences. With the numbering pop-up menu,
select where you want the sequence numbers to be indicated,
at the side of the window, at the top of each sequence
line, or not at all.
This display format allows you to see the submission
as it would appear as a GenBank or DDBJ entry.
This display format allows you to see the submission
as it would appear as an EMBL entry.
[Top of Page]
This display shows the sequence and Definition line
only, without any annotations, in a format called the
FASTA format. This is a format used by many molecular
biology analysis programs. You cannot edit in this display
mode.
This display shows the entry in Abstract Syntax Notation
1, a data description language used by the NCBI. You
cannot edit in this display mode.
The NCBI DeskTop displays the internal structure of
the record being viewed in Sequin. The
DeskTop is explained under the Misc menu.
This button allows you to validate the entry when you
are finished with the submission. See
Submitting the finished record to the database
in the Sequin help documentation.
If you have downloaded a sequence from Entrez, or if
you have performed a PowerBLAST search, you will see
additional controls at the bottom of the screen. A sequence
downloaded from Entrez will have Entrez neighboring/linking
buttons. For example, by selecting the Nucleotide pop-up,
you will be able to view nucleotide sequences which
are similar to the sequence displayed in the Target
Sequence pop-up. Or select Medline to view any literature
links.
If you perform a PowerBLAST search, you will be able
to retrieve the sequence hits directly from Entrez.
In the Alignment pop-up, select the type of search that
was done. Then click on Retrieve to retrieve the sequences
in an Entrez window. If the original sequence was downloaded
from Entrez, you will see only Entrez neighboring/linking
button, not PowerBLAST alignment/retrieve buttons.
Descriptors are annotations which apply to an entire
sequence, or an entire set of sequences, in a given
entry. They do not have a specific location on a sequence,
as they apply to the entire sequence. They can be contrasted
to Features, which apply to
a specific interval of a specific sequence.
You may edit descriptors in one of two ways.
(1) In the record viewer, double click within the text
of the descriptor to bring up a form on which information
can be added.
(2) Choose the option Descriptors from the Annotate
menu.
This menu allows you either to create new descriptors
or to modify existing ones. Select the descriptor that
you wish to modify.
When you first select a descriptor, you will see a
window called "Descriptor Target Control." Using the
target control pop-up menu, select the sequences you
wish this descriptor to cover. The name(s) listed correspond
to the SeqID(s) given to the nucleotide or amino acid
sequences when when they were imported into Sequin.
The default selection for this menu is set in the Target
Sequence pop-up menu on the record viewer. You may choose
to have the descriptor cover just one sequence, or a
set of sequences in your entry. If you are creating
a new descriptor, select "Create New." If you wish to
modify a previous descriptor, select "Edit Old."
The following is a list of some of the descriptors
which can be added. Two additional descriptors, those
for Publications and
Biological Source, are described in other sections.
[Top of Page]
This is for database staff use only. Please do not
modify the date.
This is for database staff use only. Please do not
modify the date.
This is for database staff use only.
This descriptor provides general information about
the genetic context of the sequence. For example, if
your nucleotide sequence is cloned from the region surrounding
the Huntington's Disease gene, you could enter that
information here. Providing information for this descriptor
is optional.
This descriptor is used to list any additional information
which you wish to provide about the sequence. Use of
this descriptor is optional.
This descriptor contains the information which will
go on the Definition line of the database entry. If
you supplied a title for your nucleotide sequence when
you imported it into Sequin, that information is here.
If you wish to change the Definition line, or if you
did not supply a title when you submitted the sequence,
edit this Descriptor. For more information on creating
proper Definition lines, please see the Sequin help
documentation for the
Organism and Sequences Form.
This descriptor indicates the characteristics of the
molecule from which the sequence was derived. The information
which you have already entered can be edited here.
A GenBank sequence can represent one of several different
molecule types. Enter in the Molecule pop-up menu the
type of molecule that was sequenced.
- Genomic DNA: Sequence derived directly from the
DNA of an organism. Note: The DNA sequence of an rRNA
gene has this molecule type.
- Genomic RNA: Sequence derived directly from the
genomic RNA of certain organisms, such a viruses.
- Precursor RNA: An RNA transcript before it is processed
into mRNA, rRNA, tRNA, or other cellular RNA species.
- mRNA[cDNA]: A cDNA sequence derived from mRNA.
- Ribosomal RNA: A sequence derived from the RNA in
ribosomes, for example, the sequence of a cDNA derived
from rRNA.
- Transfer RNA: A sequence derived from the RNA in
a transfer RNA, for example, the sequence of a cDNA
derived from tRNA.
- Small nuclear RNA: A sequence derived from small
nuclear RNA, for example, the sequence of a cDNA derived
from snRNA.
- Small cytoplasmic RNA: A sequence derived from small
cytoplasmic RNA, for example, the sequence of a cDNA
derived from small cytoplasmic RNA.
- Peptide: Do not select this item.
- Other-Genetic [plasmid]: A sequence that is not
normal genetic material but that is also is not a
transcription product. Examples include plasmids,
B chromosomes, and F factors.
- Genomic mRNA: Do not select this item.
- Other: Do not select this item.
[Top of Page]
Choose the appropriate option from the pop-up menu.
- Complete: Use this designation when a complete unit,
such as the complete coding sequence of a gene, is
being submitted.
- Partial: Use this designation when an incomplete
unit, such as the partial coding sequence of a gene,
is being submitted, and it is not known which end
of the sequence is incomplete.
- No left: Use this designation when an incomplete
unit, such as the partial coding sequence of a gene,
or a partial protein sequence, is being submitted.
The sequence has no left if it is incomplete on the
5', or amino-terminal, end.
- No right: Use this designation when an incomplete
unit, such as the partial coding sequence of a gene,
or a partial protein sequence, is being submitted.
The sequence has no right if it is incomplete on the
3', or carboxy-terminal, end.
- No ends: Use this designation when an incomplete
unit, such as the partial coding sequence of a gene,
or a partial protein sequence, is being submitted,
The sequence has no ends if it is incomplete at both
the 5' and 3', or amino- and carboxy- terminal, ends.
- Other: Use this designation when none of the above
descriptions apply.
From the pop-up menu, select the technique that was
used to generate the sequence.
- Standard: Standard sequencing technique.
- EST: Expressed Sequence Tag. Single-pass, low quality
mRNA sequences derived from cDNAs. These sequences
will appear in the EST division.
- STS: Sequence Tagged Site. An EST sequences which
has been mapped onto the genome. These sequences will
appear in the STS division.
- Survey: Single pass genomic sequence. These sequences
will appear in the Genome Survey Sequence (GSS) division.
- Genetic Map: This designation applies to genetic
map information, for example, in the Genomes division.
- Physical Map: This designation applies to physical
map information, for example in the Genomes division.
- Derived: A sequences assembled into a contig from
shorter sequences.
- Concept-trans: Conceptual translation. A sequence
translation generated with the appropriate genetic
code.
- Seq-pept: The protein sequence was generated by
sequencing of a peptide.
- Both: Protein sequence was generated by conceptual
translation and confirmed by peptide sequencing.
- Seq-pept-Overlap: The protein sequence was generated
by sequencing multiple peptides, and the order of
peptides was determined by overlap in their sequences.
- Seq-pept-Homol: The protein sequence was generated
by sequencing multiple peptides, and the order of
peptides was determined by homology with another protein.
- Concept-Trans-A: A conceptual translation of the
nucleotide sequence provided by the author of the
entry.
- HTGS 1: High Throughput Genome Sequence, Phase 1.
These sequences are produced by high-throughput sequencing
projects, and will be in the HTG division.
- HTGS 2: High Throughput Genome Sequence, Phase 2.
These sequences are produced by high-throughput sequencing
projects, and will be in the HTG division.
- HTGS 3: High Throughput Genome Sequence, Phase 3.
These sequences are produced by high-throughput sequencing
projects, and will be in the HTG division.
- Other: Use this designation when none of the above
descriptions apply.
[Top of Page]
From the pop-up menu, please select the type of molecule
which was sequenced.
- DNA: DNA
- RNA: RNA
- Protein: Protein
- Nucleotide: Do not select this item.
- Other: Do not select this item.
From the pop-up menu, please select the topology of
the sequenced molecule.
- Linear: Linear molecule (most sequences)
- Circular: Circular molecule (such as a plasmid)
- Tandem: Do not select this item.
- Other: Do not select this item.
From the pop-up menu, please select which strand of
the molecule was sequenced.
- Single: Only one strand was sequenced.
- Double: Both strands were sequenced.
- Mixed: In different regions, either one or both
strands were sequenced.
- Mixed Rev: Do not select this item.
- Other: Do not select this item.
The Biological Source descriptor is described in more
detail below.
Features are annotations which apply to one or more
intervals on a sequence. They can be contrasted to
Descriptors, which apply to an entire sequences
or an entire set of sequences. Features will be added
to the Target Sequence selected in the record viewer
pop-up menu. Most features are indicated on the nucleotide
sequence even if they refer to amino acid sequence motifs.
You may add or modify features in one of three ways.
(1) In the record viewer, double click on the text
of an existing feature to bring up a form on which information
can be added.
(2) Choose the feature from the Annotate menu.
(3) Choose the feature from the Sequence Editor Features
menu.
The features listed in the Annotate menu and the Sequence
Editor Features menu are identical, and the instructions
for adding them are the same, with one exception. If
you annotate them in the Annotate menu, you must provide
the nucleotide sequence location of the feature. However,
if you add features from the Sequence Editor, you do
not need to know their nucleotide coordinates. Simply
highlight the sequence which the feature covers, and
the location of the sequence will be automatically entered
in the feature location box.
[Top of Page]
This menu allows you to add or modify features on the
sequence selected in the Target Sequence pop-up menu
of the record viewer. Features are grouped into six
categories. Select the feature which you would like
to mark on your sequence. A new form will appear.
Feature forms share a common design. The first page
is specific to the particular feature, e.g., Coding
Region or Gene. The second page lists Properties of
the Feature. The third page describes the Location of
the feature. Fill out the appropriate information on
the first page.
This page lists the properties of the feature described
by the citation.
Enter general comments about the feature here.
Select any of the flags if necessary. If this sequence
contains only a partial representation of the feature
you are describing, check the "Partial" box. Check the
"Exception" box if the feature annotates a post-transcriptional
modification of the nucleotide sequence, such as ribosomal
slippage or RNA editing. Use the pop-up menu to select
what kind of evidence supports existence of the feature.
If it was confirmed experimentally, select Experimental.
If you have no experimental evidence to support the
feature, select Non-Experimental. If you do not wish
to select either option, select the blank line.
Most features are associated with a particular gene,
normally the gene from which the nucleotide sequence
derives. Select the name of the gene in the pop-up menu.
If you want to add the name of a new gene, select new,
and enter its name and optional description. By default,
mapping between the feature and the gene is done by
overlap, that is, the gene associated with the feature
is the gene whose location overlaps with the location
of the feature. Under some circumstances, for example,
if the sequences of two genes overlap, you may wish
the feature to apply to a different gene. In this case,
select cross-reference, and enter the name of the new
gene in the pop-up menu. If you do not want the feature
to map to any gene, select suppress. You may also edit
information on the Gene feature form by clicking on
Edit Gene Feature.
Add any comments about the feature here, especially
if you checked the "Exception" box on the General Subpage.
This page is used to list any citations which specifically
apply to the feature you are annotating. The citation
must have already been entered into the record (see
Publications ) in the Sequin
help documentation. Click on Edit Citations, and place
a check mark in box next to the publication you want
to cite. In order to keep the size of the database down,
we discourage the liberal use of citations on features.
This page is used to cross-reference this entry to
entries in external databases (databases other than
GenBank, EMBL/EBI, and DDBJ), such as dbEST or FLYBASE.
Most users should leave this page blank. For more information
on this topic, see the International Nucleotide Sequence
Database Collaboration
page .
This page allows you to select the location of the
feature you are citing. Each feature must have an sequence
interval associated with it. In most cases, Sequin will
know whether the feature applies to a nucleic acid or
a protein sequence, and will limit the options you can
select, accordingly.
Sequin is a submission tool for nucleotide sequence
databases. Thus, the location for most features is indicated
on the nucleotide sequence only. For example, even though
mature peptide, signal peptide, and transit peptide
describe protein sequence, their location is indicated
on the DNA.
Check the 5' Partial or 3' Partial box if the feature
in your nucleic acid sequence is missing residues at
the 5' or 3' ends, respectively. Check the NH2 Partial
or COO Partial if the feature in your amino acid sequence
is missing residues at the amino- or carboxy-terminal
ends, respectively.
Enter the sequence range of the feature. The numbers
should correspond to the nucleotide sequence interval
if the SeqID is set to a nucleotide sequence, and to
an amino acid sequence interval if the SeqID is set
to a protein sequence. If the feature spans multiple,
non-continuous intervals on the sequence, indicate the
beginning and end points of each interval. If each interval
is separate, and should not be joined with the others
to describe the feature, check the Intersperse intervals
with gaps box (for example, when annotating multiple
primer binding sites). If the feature is composed of
several intervals which should all be joined together,
do not check the box (for example, when annotating mRNA
on a genomic DNA sequence).
For nucleic acid Features only: From the pop-up menu,
select the strand on which the feature is found.
- Plus: Plus strand, or coding strand.
- Minus: Minus strand, or noncoding strand.
- Both: Both strands.
- Reverse: Do not select this item.
- Other: Do not select this item.
Use the pop-up menu to select the SeqID of the sequence
you are describing by the location.
A brief description of the available features follows.
A detailed explanation of how to use the coding region
(CDS) feature is included. The DDBJ/EMBL/GenBank feature
table definition
page provides detailed information about other
features.
[Top of Page]
a related individual or strain contains stable, alternative
forms of the same gene which differs from the presented
sequence at this location (and perhaps others)
1) region of DNA at which regulation of termination
of transcription occurs, which controls the expression
of some bacterial operons; 2) sequence segment located
between the promoter and the first structural gene that
causes partial termination of transcription
Constant region of immunoglobulin light and heavy chains,
and T-cell receptor alpha, beta, and gamma chains. Includes
one or more exons depending on the particular chain
CAAT box; part of a conserved sequence located about
75 bp upstream of the start point of eukaryotic transcription
units which may be involved in RNA polymerase binding;
consensus=GG(C or T)CAATCT
coding sequence; sequence of nucleotides that corresponds
with the sequence of amino acids in a protein (location
includes stop codon). Feature includes amino acid conceptual
translation
Most users add a coding region to their sequence when
they fill out the Organism and Sequences form. However,
you may need to edit the coding region, or add additional
ones. Choose CdRgn under the Coding Regions and Transcripts
submenu of the Features menu, or, to edit an existing
CDS, double click on the record viewer. If you appended
the partial sequence of a coding region to the Organism
and Sequences form, you will probably need to edit the
Coding Region feature to avoid validation error messages
about the location of the coding region.
Choose the genetic code which should be used to translate
the nucleotide sequence. For more information, and for
the translation tables themselves, see the NCBI taxonomy
page .
Choose the reading frame in which to translate the
sequence.
Click on Launch Product Viewer to see the record viewer
for the coding region.
Supply additional information about the protein by
clicking on Edit Protein Information to launch the Protein
feature forms. The protein name must have already been
filled out on the Protein subpage.
If retranslate on accept is checked, Sequin will, when
you click on Accept, translate the nucleotide sequence
according to the interval(s) indicated on the Locations
page. This new translation will replace any earlier
translations you have supplied. This should not be a
problem if the interval was indicated appropriately.
However, if you want to make sure that Sequin does not
retranslate the sequence, do not check the box.
If the coding sequence which you supply is a partial
sequence, and you have checked a Partial box on the
Location subpage, it is a good idea to check the Synchronise
Partials box. In this case, Sequin will ensure that
all other appropriate features (such as protein) are
also marked as partial.
Exceptions describe places where there is a posttranslational
modification. Enter the amino acid position at which
the modification occurs, and select the amino acid which
is actually represented in the protein from the pop-up
list. Sequin will change the amino acid number to a
nucleotide interval.
[Top of Page]
Use this page to enter or edit a name or description
of the protein product. For a new sequence, enter information
directly into the boxes. You can edit descriptions of
an existing sequence by clicking on Edit Protein Information,
which will bring up the Protein feature form.
Choose the sequence you wish to view by selecting its
name under the Product pop-up menu. You may also import
a new protein sequence by selecting Import Protein FASTA
under the file menu. The sequence should be formatted
as described above on the Organism and Sequences form.
After you have imported a protein sequence, click on
Predict Interval. This function will predict the interval
on the nucleotide sequence to which the coding region
applies. If you do not select this function, the interval
will likely be wrong, and you will get error message
when you attempt to validate the record. If your sequence
is a 5' or 3' partial, you must first indicate this
manually on the Locations Page.
You may also have Sequin generate the protein sequence
from the nucleotide sequence by clicking on Translate
Product. However, unless the location of the coding
region is correctly indicated on the Location page,
Sequin will translate the entire nucleotide sequence,
including potential 5' and 3' untranslated regions.
This will likely result in error messages when you attempt
to validate the record. You must also select the correct
reading frame on the General subpage.
independent determinations of the "same" sequence differ
at this site or region
displacement loop; a region within mitochondrial DNA
in which a short stretch of RNA is paired with one strand
of DNA, displacing the original partner DNA strand in
this region; also used to describe the displacement
of a region of one strand of duplex DNA by a single
stranded invader in the reaction catalysed by RecA protein
Diversity segment of immunoglobulin heavy chain, and
T-cell receptor beta chain
a cis-acting sequence that increases the utilization
of (some) eukaryotic promoters, and can function in
either orientation and in any location (upstream or
downstream) relative to the promoter
region of genome that codes for portion of spliced
mRNA; may contain 5'UTR, all CDSs, and 3' UTR
GC box; a conserved GC-rich region located upstream
of the start point of eukaryotic transcription units
which may occur in multiple copies or in either orientation;
consensus=GGGCGG
intervening DNA; DNA which is eliminated through any
of several kinds of recombination
a segment of DNA that is transcribed, but removed from
within the transcript by splicing together the sequences
(exons) on either side of it
[Top of Page]
Joining segment of immunoglobulin light and heavy chains,
and T-cell receptor alpha, beta, and gamma chains
long terminal repeat, a sequence directly repeated
at both ends of a defined sequence, of the sort typically
found in retroviruses
mature peptide or protein coding sequence; coding sequence
for the mature or final peptide or protein product following
post-translational modification. the location does not
include the stop codon (unlike the corresponding CDS)
site in nucleic acid which covalently or non-covalently
binds another moiety that cannot be described by any
other Binding key (primer_bind or protein_bind)
feature sequence is different from that presented in
the entry and cannot be described by any other Difference
key (conflict, unsure, old_sequence, mutation, variation,
allele, or modified_base)
region of biological interest which cannot be described
by any other feature key; *misc_recomb site of any generalised,
site-specific or replicative recombination event where
there is a breakage and reunion of duplex DNA that cannot
be described by other recombination keys (iDNA and virion)
or qualifiers of source key (/insertion_seq, /transposon,
/proviral)
any transcript or RNA product that cannot be defined
by other RNA keys (prim_transcript, precursor_RNA, mRNA,
5'clip, 3'clip, 5'UTR, 3'UTR, exon, intron, polyA_site,
rRNA, tRNA, scRNA, and snRNA)
any region containing a signal controlling or altering
gene function or expression that cannot be described
by other Signal keys (promoter, CAAT_signal, TATA_signal,
-35_signal, -10_signal, GC_signal, RBS, polyA_signal,
enhancer, attenuator, terminator, and rep_origin)
any secondary or tertiary structure or conformation
that cannot be described by other Structure keys (stem_loop
and D-loop)
the indicated nucleotide is a modified nucleotide and
should be substituted for by the indicated molecule
(given in the mod_base qualifier value)
messenger RNA; includes 5'untranslated region (5'UTR),
coding sequences (CDS, exon) and 3'untranslated region
(3'UTR)
a related strain has an abrupt, inheritable change
in the sequence at this location
Extra nucleotides inserted between rearranged immunoglobulin
segments
[Top of Page]
the presented sequence revises a previous version of
the sequence at this location
recognition region necessary for endonuclease cleavage
of an RNA transcript that is followed by polyadenylation;
consensus=AATAAA
site on an RNA transcript to which will be added adenine
residues by post-transcriptional polyadenylation
any RNA species that is not yet the mature RNA product;
may include 5' clipped region (5'clip), 5' untranslated
region (5'UTR), coding sequences (CDS, exon), intervening
sequences (intron), 3' untranslated region (3'UTR),
and 3' clipped region (3'clip)
primary (initial, unprocessed) transcript; includes
5' clipped region (5'clip), 5' untranslated region (5'UTR),
coding sequences (CDS, exon), intervening sequences
(intron), 3' untranslated region (3'UTR), and 3' clipped
region (3'clip)
Non-covalent primer binding site for initiation of
replication, transcription, or reverse transcription.
Includes site(s) for synthetic e.g., PCR primer elements
region on a DNA molecule involved in RNA polymerase
binding to initiate transcription
non-covalent protein binding site on nucleic acid
ribosome binding site
region of genome containing repeating units
single repeat element
origin of replication; starting site for duplication
of nucleic acid to give two identical copies
mature ribosomal RNA ; the RNA component of the ribonucleoprotein
particle (ribosome) which assembles amino acids into
proteins
Switch region of immunoglobulin heavy chains. Involved
in the rearrangement of heavy chain DNA leading to the
expression of a different immunoglobulin class from
the same B-cell
many tandem repeats (identical or related) of a short
basic repeating unit; many have a base composition or
other property different from the genome average that
allows them to be separated from the bulk (main band)
genomic DNA
[Top of Page]
small cytoplasmic RNA; any one of several small cytoplasmic
RNA molecules present in the cytoplasm and (sometimes)
nucleus of a eukaryote
signal peptide coding sequence; coding sequence for
an N-terminal domain of a secreted protein; this domain
is involved in attaching nascent polypeptide to the
membrane; leader sequence
small nuclear RNA; any one of many small RNA species
confined to the nucleus; several of the snRNAs are involved
in splicing or other RNA processing reactions
identifies the biological source of the specified span
of the sequence. This key is mandatory. Every entry
will have, as a minimum, a single source key spanning
the entire sequence. More than one source key per sequence
is permittable
hairpin; a double-helical region formed by base-pairing
between adjacent (inverted) complementary sequences
in a single strand of RNA or DNA
Sequence Tagged Site. Short, single-copy DNA sequence
that characterises a mapping landmark on the genome
and can be detected by PCR. A region of the genome can
be mapped by determining the order of a series of STSs
TATA box; Goldberg-Hogness box; a conserved AT-rich
septamer found about 25 bp before the start point of
each eukaryotic RNA polymerase II transcript unit which
may be involved in positioning the enzyme for correct
initiation; consensus=TATA(A or T)A(A or T)
sequence of DNA located either at the end of the transcript
or adjacent to a promoter region that causes RNA polymerase
to terminate transcription; may also be site of binding
of repressor protein
transit peptide coding sequence; coding sequence for
an N-terminal domain of a nuclear-encoded organellar
protein; this domain is involved in post- translational
import of the protein into the organelle
mature transfer RNA, a small RNA molecule (75-85 bases
long) that mediates the translation of a nucleic acid
sequence into an amino acid sequence
author is unsure of exact sequence in this region
Variable region of immunoglobulin light and heavy chains,
and T-cell receptor alpha, beta, and gamma chains. Codes
for the variable amino terminal portion. Can be made
up from V_segments, D_segments, N_regions, and J_segments
Variable segment of immunoglobulin light and heavy
chains, and T-cell receptor alpha, beta, and gamma chains.
Codes for most of the variable region (V_region) and
the last few amino acids of the leader peptide
[Top of Page]
a related strain contains stable mutations from the
same gene (e.g., RFLPs, polymorphisms, etc.) which differ
from the presented sequence at this location (and possibly
others)
viral genomic sequence as it is encapsidated, as distinguished
from its proviral form (integrated in a host cell's
chromosome)
3'-most region of a precursor transcript that is clipped
off during processing
region near or at the 3' end of a mature transcript
(usually following the stop codon) that is not translated
into a protein; trailer
5'-most region of a precursor transcript that is clipped
off during processing
region near or at the 5' end of a mature transcript
(usually preceding the initiation codon) that is not
translated into a protein; leader
Pribnow box; a conserved region about 10 bp upstream
of the start point of bacterial transcription units
which may be involved in binding RNA polymerase; consensus=TAtAaT
a conserved hexamer about 35 bp upstream of the start
point of bacterial transcription units; consensus =
TTGACa or TGTTGACA
This annotation is very important, as an entry cannot
be processed by the databases unless it includes some
basic information about the organism from which the
sequence derived. This basic information was entered
previously in the submission, in the Organism and Sequences
Form. The more detailed Organism Information form allows
you to alter or add to the data you entered earlier.
Sequin allows two types of biological source information
to be entered, Biological Source Descriptors and Biological
Source Features. Biological Source Descriptors, like
other descriptors, provide organism information about
an entire sequence, or an entire set of sequences, in
an entry. Biological Source Features, like other features,
provide organism information about a specific interval
on a given sequence.
In most cases, you will want to use a Biological Source
Descriptor, as all the sequences in the entry will derive
from the same source. However, if you have sequenced
a chimeric molecule, for example, one that is part yeast
and part mouse, you would use Biological Source Features
to annotate which sequence derived from yeast and which
from mouse.
To add a Biological Source Descriptor, select Biological
Source under the Descriptor section of the Annotate
menu. To add a Biological Source Feature, select Biological
Source under the Bibliographic and Comments section
of the Annotate menu.
Annotating a Biological Source Descriptor or Feature
is similar to annotating any descriptor or feature.
For help in creating descriptors and features, see the
appropriate section of the help documentation. The following
are instructions for filling out Biological Source-specific
forms.
[Top of Page]
The scrollable list contains the scientific names of
many organisms. To reach a name on the list, either
type the first few letters of the scientific name, or
use the thumb bar. Click on a name from the list to
fill out the scientific name field. If there is a common
name for the organism, that field will be filled out
automatically. You may also directly type in the scientific
name. If you have any questions about the scientific
or common name of an organism, see the NCBI
taxonomy browser
From the selection list, please enter the location
of the genome which contains your sequence. Most entries
will have a "Genomic" location. The following is a brief
description of the choices in this pop-up menu:
- Genomic: Sequence is located in a chromosome. This
category includes mitochondrial and chloroplast proteins
which are encoded by the nuclear genome.
- Chloroplast: Sequence is found in plant chloroplast
DNA.
- Chromoplast: Sequence is found in the DNA of a plant
or algae chromoplast, a plastid containing a colored
pigment.
- Kinetoplast: Sequence is found in the DNA of a trypanosome
kinetoplast.
- Mitochondrion: Sequence is found in mitochondrial
DNA.
- Plastid: Sequence is found in the DNA of a plant
or algae plastid.
- Macronuclear: Sequence is found in the macronucleus
of a ciliated unicellular organism.
- Extrachromosomal: Sequence is found in another extrachromosomal
element not listed here, such as a B chromosome or
an F factor.
- Plasmid: Sequence is on a bacterial plasmid.
- Transposon: Sequence is from a transposable element.
- Insertion sequence: Sequence is from an integrated
transposon.
- Cyanelle: Sequence is from an algae cyanelle.
- Proviral: Sequence is from an integrated viral chromosome.
- Natural: Do not select this item.
- Natural Mutant: Do not select this item.
- Mutant: Do not select this item.
- Artificial: Do not select this item.
- Synthetic: Do not select this item.
- Other: Do not select this item.
Please use this field to select the genetic code which
should be used to translate the nucleic acid sequence.
The genetic code for a eukaryotic organism is "Standard".
If you selected an organism name from the scrollable
list described above, this field was filled out automatically.
Listed here are the translation tables which can be
selected. For more information, and for the translation
tables themselves, see the NCBI taxonomy
page .
- Standard
- Vertebrate mitochondrial
- Yeast mitochondrial
- Mold mitochondrial, etc. This selection includes
mold, protozoan, and coelenterate mitochondria as
well as mycoplasma and spiroplasma.
- Invertebrate mitochondrial
- Ciliate nuclear, etc. This selection includes ciliate,
dasycladacean and hexamita nuclei
- Echinoderm mitochondrial
- Euploid nuclear
- Bacterial. This selection includes all eubacteria
and archaebacteria.
- Alternative yeast nuclear
- Ascidian mitochondrial
- Flatworm mitochondrial
- Blepharisma macronuclear
[Top of Page]
This information is normally entered by the database
staff. They will use the
taxonomy database maintained by the NCBI/GenBank.
If you wish to enter a taxonomic lineage which is different
than that in the NCBI database, please enter it here.
If you are running Sequin in its
network-aware mode, you will see a button labelled
"Lookup Taxonomy." Click on this button to perform an
automatic lookup of the taxonomic lineage of the organism.
Sequin will perform the lookup by accessing the Taxonomy
database at the NCBI, and will fill out the Taxonomic
Lineage and Division fields.
If you have any comments about the taxonomic lineage
determined by Sequin, please submit these comments with
your entry. Under the Sequin File menu, select Edit
Submitter Info. Enter your comments in the box entitled
"Special Instructions to Database Staff", on the Submission
page. Someone from the NCBI will contact you after your
submission is received.
This page allows you to enter additional information
about the source and/or organism. Entering information
is optional.
Choose a modifier from the pull-down menu on the left
side of the page and type the appropriate name on the
right side of the page. If you do not find appropriate
modifiers in the scroll down list, you can enter additional
source information as text in the field at the bottom
of the page. You may select as many modifiers as you
want.
The following is a description of the available modifiers:
- Chromosome: Chromosome to which the gene maps.
- Map: Map location of the gene.
- Clone: Name of clone from which sequence was obtained.
- Subclone: Name of subclone from which sequence was
obtained.
- Haplotype: Haplotype of the organism.
- Genotype: Genotype of the organism.
- Sex: Sex of the organism from which the sequence
derives.
- Cell-line: Cell line from which sequence derives.
- Cell-type: Type of cell from which sequence derives.
- Tissue-type: Type of tissue from which sequence
derives.
- Clone-lib: Name of library from which sequence was
obtained.
- Dev-stage: Developmental stage of organism.
- Frequency: Frequency of occurrence of a feature.
- Germline: If the sequence shown is DNA and a member
of the immunoglobulin family, this qualifier is used
to denote that the sequence is from unrearranged DNA.
- Rearranged: If the sequence shown is DNA and a member
of the immunoglobulin family, this qualifier is used
to denote that the sequence is from rearranged DNA.
- Lab-host: Laboratory host used to propagate the
organism from which the sequence was derived.
- Pop-variant: Name of the population variant from
which the sequence was obtained.
- Tissue-lib: Tissue library from which the sequence
was obtained.
- Plasmid-name: Name of plasmid from which the sequence
was obtained.
- Transposon-name: Name of transposable element from
which the sequence was obtained.
- Ins-sequence-name: Name of insertion element from
which the sequence was obtained.
- Plastid-name: Name of plastid from which the sequence
was obtained.
[Top of Page]
Choose a modifier from the pull-down menu on the left
side of the page and type the appropriate name on the
right side of the page. If you do not find appropriate
modifiers in the scroll down list, you can enter additional
organism information as text in the field at the bottom
of the page. You may select as many modifiers as you
want.
The following is a description of the available modifiers:
- Strain: Strain of organism from which sequence was
obtained.
- Substrain: Sub-strain of organism from which sequence
was obtained.
- Type: Type of organism from which sequence was obtained.
- Subtype: Subtype of organism from which sequence
was obtained.
- Variety: Variety of organism from which sequence
was obtained.
- Serotype: Variety of a species (usually a fungus,
bacteria or virus) characterised by its antigenic
properties. Same as serogroup and serovar.
- Serogroup: See serotype.
- Serovar: See serotype.
- Cultivar: Variety of plant from which sequence was
obtained.
- Pathovar: Variety of a species (usually a fungus,
bacteria or virus) characterised by the biological
target of the pathogen. Examples include Pseudomonas
syringae pathovar tomato and Pseudomonas syringae
pathovar tabaci.
- Chemovar: Variety of a species (usually a fungus,
bacteria or virus) characterised by its biochemical
properties.
- Biovar: Variety of a species (usually a fungus,
bacteria or virus) characterised by some specific
biological property (often geographical, ecological,
or physiological). Same as biotype.
- Biotype: See biovar.
- Group: Do not select this item.
- Subgroup: Do not select this item.
- Isolate: Identification or description of the specific
individual from which this sequence was obtained.
An example is Patient X14.
- Common: Common name of the organism from which sequence
was obtained.
- Acronym: Standard synonym (usually of a virus) based
on the initials of the formal name. An example is
HIV-1.
- Dosage: Do not select this item.
- Natural Host: When the sequence submission is from
an organism that exists in a symbiotic, parasitic
or other special relationship with some second organism,
the 'natural host' modifier can be used to identify
the name of the host species.
- Sub-species: Sub-species of organism from which
sequence was obtained.
- Old name: Do not select this item.
If there are alternative names for the organism from
which the sequence derives, enter them here. Please
be aware that this is the appropriate field only for
alternative names for the organism, not for alternative
gene or protein names.
[Top of Page]
This page is used to cross-reference this entry to
entries in external databases (databases other than
GenBank, EMBL/EBI, and DDBJ), such as dbEST or FLYBASE.
Most users should leave this page blank. For more information
on this topic, see the International Nucleotide Sequence
Database Collaboration
page .
Sequin allows two types of publications to be entered,
Publication Descriptors and Publication Features. Publication
Descriptors are bibliographic references which, like
other descriptors, cover an entire sequence, or an entire
set of sequences, in an entry. Publication Features
are bibliographic references which, like other features,
cover a specific interval on a given sequence.
Publications are entered into the Reference field of
the database entry. References are citations of unpublished,
in press, or published works which are relevant to the
submitted sequence. You may add as many citations as
you wish. However, you must decide whether the Publication
should be entered as a descriptor or a feature.
In general, if there is one publication describing
a sequence, a Publication Descriptor should be used.
If multiple publications all describe the same sequence,
only the reference which appeared first in print should
be included in the descriptor. To enter a Publication
Decriptor, select Publications under the Annotate menu,
and click on Publication Descriptor.
However, if one publication describes the cloning of
the 5' end of a gene, and another publication describes
the cloning of the 3' end of the gene, Publication features
should be used. To make a publication feature, first,
enter the publication itself into the record by choosing
Publication Feature in the Publications section of the
Annotate menu. Enter the information about the publication,
and then enter the interval (on the Location page) that
the publication refers to. Next, add this reference
to the feature by double clicking on the feature, going
to the Citations subpage of the Properties page, clicking
on Edit Citations, and selecting the citation(s) you
want to add.
We do not encourage the liberal use of Publication
Features in an entry in an effort to keep the size of
the database down. Please enter a reference for a feature
only if it is a novel or controversial feature.
If you plan to add a reference to a published journal
article, you should run Sequin in its network-aware
mode. In this mode, the program, if supplied with certain
minimal information, automatically fills out the Title,
Authors, and Journal pages by looking up the information
in the Medline database. Instructions
for making Sequin network aware are provided in
the documentation for the Misc menu.
Instructions for performing a Medline lookup are
provided below.
Annotating a Publication Descriptor or a Publication
Feature is similar to annotating any descriptor or feature.
For help in creating descriptors and features, see the
appropriate section of the help documentation. The following
are instructions for filling out Publication-specific
forms.
Using the radio buttons, select one of the three options:
- Unpublished: Select this option if (1) There are
no plans to publish a manuscript describing this sequence
submission, (2) A manuscript has been written but
not yet submitted, or, (3) A manuscript has been submitted
for publication but has not yet been accepted.
- In Press: The article has been accepted for publication
but is not yet in print.
- Published: The article has been published.
[Top of Page]
Using the radio buttons, select the type of publication
in which the sequence will appear.
- Journal: Journal
- Book Chapter: Chapter in a book
- Book: Entire book
- Thesis/Monograph: Thesis or monograph
- Proceedings Chapter: Abstract from a meeting
- Proceedings: A meeting
- Patent: Patent
- Submission:
Please select one or more option using the radio buttons:
- Refers to the entire sequence: The publication listed
in this citation describes the entire sequence in
this entry. This is most commonly the publication
in which the sequence is first described.
- Refers to part of the sequence: The publication
listed in this citation describes only a part of the
sequence. This option should be used when (1) Different
parts of a sequence have been generated by different
investigators, or (2) The sequence was generated in
segments over a period of time.
- Cites a feature on the sequence: The publication
listed in this citation describes or refers to a feature
(such as a class of promoter or a motif) which has
been found in the sequence.
After you have filled out the Citation on Entry form,
click on "Proceed" to see the next form.
At the bottom of each page are two options: Accept
and Cancel. Click on "Accept" to replace all parts of
an existing citation with the information entered on
this form. Click "Cancel" to cancel the process of changing
a citation.
Please enter the names of the authors. Note that the
first name of the author is listed first. You can add
as many authors to this page as you wish. After you
type in the name of the third author, the box becomes
a spreadsheet, and you can scroll down to the next line
by using the thumb bar.
Please enter information about the institution with
which the principal author of the manuscript is affiliated.
Other pages in the Citation Information Form will be
different depending on the Class of publication selected
in the Citation on Entry Form. Instructions for filling
out the Citation Information Form for Journals is included
here.
Enter title for manuscript in the box.
Fill in the appropriate Journal, Volume, Issue, Pages,
Day, and Year fields by typing information into the
boxes. Select the month with the pop-up menu. If necessary,
choose an option from the Erratum pop-up menu and explain
the erratum. If you know the MUID, the Medline Unique
Identifier, please enter it.
If you are running Sequin in its
network-aware mode, the program will look up the
Title, Author, and Journal information in the Medline
database if you supply it with some minimal information.
For example, if you know the MUID (Medline Unique Identifier)
of the publication, enter it in the appropriate box
and select "Lookup By MUID." Sequin will automatically
retrieve the rest of the information. One way to find
the MUID of the publication is to look up the publication
with the NCBI's
Entrez service. Alternatively, if you do not know
the MUID, enter the Journal, Volume, Pages, and Year.
Then select "Lookup Article." Sequin will retrieve the
missing Title and Authors inforamtion.
[Top of Page]
This can be any additional comment that should be associated
with this citation.
Details about the current version of Sequin.
Launches the help documentation.
Open an existing entry. This option will open a record
which has been previously saved in Sequin. Furthermore,
for analysis purposes, it can also open a FASTA-formatted
sequence file. The sequence will be displayed in Sequin
and can be analyzed with tools such as PowerBLAST, but
it should not be submitted, because it does not have
the appropriate annotations.
Opens a FASTA formatted sequence directly into the
Sequence Editor. The sequence can then be analysed with
tools such as the "Find" command.
Close this entry.
Import previously saved information. The type of information
which can be imported depends on which window is open
in Sequin. For example, Import, used in conjunction
with Export, allows you to save pages from a form and
import them into multiple submissions.
Save information from a window to a file. The type
of information which can be imported depends on which
window is open in Sequin. For example, Export, used
in conjunction with Import, allows you to save pages
from a form and import them into multiple submissions.
Duplicates the entry. You can then view the entry simultaneously
in different Display Formats.
Saves the entry. Note: This merely saves the entry
so you can go back and edit it. It does not prepare
the entry for submission to the database, that is, it
does not validate the entry.
See Save.
Replaces the displayed record with previously saved
version. This feature is useful if you have made unwanted
changes since you last saved the record.
Prepares the entry for submission to the database.
See
Submitting the finished record to the database
in the Sequin help documentation.
[Top of Page]
Prints the window which is currently selected. The
selected window can be one of the Sequin forms or pages,
or the help documentation.
Exit from Sequin.
Copy the selected item.
Clear the selected item.
Duplicates the selected feature.
To edit a single sequence, select the sequence identifier
in the Target Sequence pop-up menu, and click on Edit
sequence. The sequence editor will be launched for that
sequence. The sequence editor
is discussed in more detail below.
To edit a set of aligned sequences, select All Sequences
in the Target Sequence pop-up menu and select Alignment
in the Display Format. Highlight the alignment by clicking
inside of the box surrounding the sequence bars, and
click on Edit alignment. The alignment editor will be
launched. The
alignment editor is discussed in more detail below.
Opens up the Submission Instructions form, which allows
you to enter additional information about the person
submitting the record. Much of this information was
entered on the first form in Sequin, the Submitting
Authors form.
You can also save the information from the Submitting
Authors form here, so that you can use it in subsequent
Sequin submissions. Click on "Edit Submitter Info",
and, under the file menu in the resulting Submission
Instructions form, click on Export Submitter Info to
save the information to a file. For subsequent Sequin
submissions, if you have already saved the submittor
information, click on Import Submitter Info under the
File menu on the Submission page of the Submitting Authors
form.
Indicate the type of submission. If it is a new submission,
select New. If you are updating an existing submission
in order to resubmit it to the databases, select Update.
Check either the "Yes" or "No" radio button to indicate
if the record should be released before publication.
If you select "Yes," the entry will be released to the
public after the database staff has added it to the
database. If you select "No," fields will appear in
which you can indicate the date on which the sequences
should be released to the public. The submission will
then be held back by the database staff until formal
publication of the sequence or GenBank Accession Number,
or until the Release Date, whichever comes first. If
you have any special instructions, enter them in the
box at the bottom of the page.
Update the name, affiliation, or contact numbers of
the person submitting the record.
[Top of Page]
Update the names and affiliation of the people who
should receive scientific credit for the generation
of sequences in this entry. The address should list
the principal institution in which the sequencing and/or
analysis was carried out. If multiple labs were involved
in the project, this page should contain information
about the workplace of the senior author. If you are
submitting the record as an update to the databases,
explain the reason for the update on the Description
subpage.
This selection allows you to make changes to your sequence
by replacing it with another sequence, merging two sequences
which overlap at their ends, or by copying features
from one sequence to another. The new sequence and associated
annotations will be imported into Sequin, aligned with
the original sequence, and then you choose whether you
want to merge the sequences or the features.
First, select the format of the new sequence. The new
sequence must have a different Sequence Identifer from
the old sequence. Use Read FASTA file to import a sequence
in FASTA format. Use Read Sequence Record to import
a sequence in ASN.1 format (for example, a sequence
record which has already been saved in Sequin). If you
are running Sequin in Network
Aware mode, you can use Download Accession to import
a record from Entrez. Finally, if you have done a PowerBLAST
search, you can use Selected Alignment to import a sequence
from the alignment that you have selected in the Graphic
or Alignment view.
In all cases, the imported and original sequences must
have a region of sequence similarity which is high enough
for the two sequences to be aligned by BLAST using default
parameters.
After you import the new sequence, a new window will
open which displays a graphical view of the old (target)
and new sequences and their associated features. You
can choose to Replace the target sequence with the new
one, add the 5' end of the new sequence onto the target
(Merge 5p), or add the 3' end of the new sequence onto
the target (Merge 3p). Choose Copy features to copy
the features, but not the sequence, from the new sequence
onto the target. You may wish to Preview alignment before
making any changes to see the alignment of the two sequences.
If you are importing a Sequence Record, Downloading
an Accession, or Selecting an Alignment, you will be
asked whether you wish to retain the publications which
are on the source sequence record and have them apply
to the appropriate range on the new record.
Under this command, you can find and replace strings
of letters in those fields of your submission that contain
manually entered data. The fields which can be altered
are Locus, Definition, Accession, Keywords, Source,
Reference and Features. To use this option, select Find
and fill the Find and Replace lines with the appropriate
text. Note that you cannot edit the sequence in this
way.
Under this command, you can find strings of letters
in all fields of your submission.
This option detects discrepancies between the format
of your submission and that required by the database
selected for entry. If discrepancies are present, it
suggests ways in which to correct them. See the topic
on
Submitting the finished record to the database
in the Sequin help documentation.
[Top of Page]
Performs a spelling check on the record.
Performs a PowerBLAST search of the selected sequence
against the NCBI's sequence databases. PowerBLAST is
a version of the
BLAST sequence similarity search. In order to do
a PowerBLAST search, Sequin must be in its network aware
mode. This search will take a few minutes.
PowerBLAST can perform a number of different searches
of the nucleotide or protein databases. You can search
a nucleotide sequence against a nucleotide database
(blastn), a six-frame translation of a nucleotide sequence
against a protein database (blastx), a protein sequence
against a protein database (blastp), or a protein sequence
against a six-frame translation of a nucleotide database
(tblastn). Most NCBI-supported databases can be searched
from within Sequin. The sequence that is used as the
search sequence is the one selected under the Target
Sequence pop-up. If there is a set of sequences in the
record, and All Sequences is selected, the BLAST search
will be performed with all sequences.
To carry out a search, select Search-->PowerBLAST.
Check the box next to the program you want to use. You
will only be able to select those programs which are
appropriate for the type of sequence(s) in the Target
Sequence pop-up (i.e., a nucleotide search for a nucleotide
sequence). Next, select the database. You can modify
the BLAST parameters by typing in numbers by hand, or
by selecting a Stringent, Normal, or Relaxed search
in the Sensitivity pop-up menu.
PowerBLAST allows you to limit a search either for
or against an organism or taxonomic group. Under Organism
Filter, click on "Restrict to" to limit your search
to a particular organism. Or, conversely, click on "Filter
against" to search against all organisms except one.
Type the scientific name of the organism (e.g., Homo
Sapiens) or taxonomic group (e.g., Mammalia) in the
"Name" box.
The results of the PowerBLAST search will be displayed
in the record viewer, in the Summary, Graphic, and Alignment
Display Formats. Double click on a sequence to launch
the Entrez record viewer for the sequence. If you have
run a blastn search, and have an output of nucleotide
sequences, you can see the alignment of all the sequences.
Click on any sequence in the record viewer, and select
Edit Alignment under the Edit menu.
If you do a PowerBLAST search on a sequence that was
not downloaded from Entrez (i.e., if the sequence does
not have a gi number), additional controls will be added
to the bottom of the window. These controls allow you
to retrieve the PowerBLAST hits from Entrez, and then
look for Entrez neighbors. Use the alignment pop-up
to select the type of alignment (search) that was performed.
Then click on the Retrieve button to retrieve the records
in an Entrez window. In the new window, you can view
Medline, Protein, Nucleotide, Structure, and Genome
neighbors of the sequence(s). Click on the Refine button
to open a second Entrez window. In this window, you
can further refine the PowerBLAST hits by selecting
other Entrez terms, such as Author name to view sequences
belonging to a specific author.
The results of the PowerBLAST search are for your use
only. If you submit a record that contains PowerBLAST
results, the database staff will remove the hits from
the record before releasing it to the public.
This option is only available if you are running Sequin
in its network-aware mode.
The vector screen will perform BLAST sequence comparisons
between your sequence and the vector and mitochondrial
sequence databases maintained by the NCBI. If you have
indicated that the sequence is mitochondrial in origin,
however, the search against the mitochondrial database
will not be performed. If you have multiple or long
sequences, this search may take a few minutes. When
the search is complete, a window will appear which lists
significant hits between your sequence and those in
the vector and mitochondrial databases. If this analysis
indicates that your sequence contains any unwanted vector
or mitochondrial sequence, please remove the contamination
before you submit the sequence to the databases. Now
that your sequence is in Sequin, you can edit it in
the Sequence Editor.
Note that this information is for your use only. When
you are finished looking at the analysis, close the
window. Do not try to submit these results along with
your sequence.
[Top of Page]
The ORF Finder shows a graphical representation of
all the open reading frames (ORFs) in the nucleotide
sequence. This tool allows you to select ORFs and have
them appear as coding sequence (CDS) features on the
sequence record.
The ORFs, indicated by colored boxes, are defined as
the longest sequence which stretches from a start codon
to a stop codon. If the entire nucleotide sequence is
an open reading frame, but does not contain an initial
start or a terminal stop codon, it will be indicated
as an ORF as well. All six reading frames are shown;
the top three boxes represent the plus strands, and
the bottom three boxes the minus strands. The nucleotide
sequence intervals of the ORFs are displayed in descending
length order on the right side of the window. Intervals
on the complementary (minus) strand are indicated by
a c. ORFs can be selected by clicking either directly
on them or on the sequence interval. The ORF length
button selects the length of ORFs which are displayed.
For example, the default of 10 shows all ORFs which
are greater than 10 nucleotides in length. Clicking
on the box labelled ORF changes the display; potential
start codons are indicated in white, and stop codons
in red. ORFs can be selected in this display also. The
definition of start and stop codons is dependent on
the genetic code which was selected. Be sure to choose
the appropriate genetic code from translating the sequence
before opening the ORF finder.
The ORF finder works in conjunction with the Sequence
Editor. Once an ORF is selected, its sequence is highlighted
in the editor. Using tools in the Sequence Editor, you
can make the highlighted sequence into a CDS, translate
it, and save it as a CDS feature in the record. See
the documentation on
Editing a CDS in the
Sequence editor .
The Repeat Finder searches for repeated sequences,
such as Alu sequences, within human nucleotide sequences.
If any repeated sequences are found, they are indicated
as features on the database record. The search may take
up to a few minutes on longer sequences. Since the database
contains only repeated sequences from humans, the Repeat
Finder should only be used on human sequences. The NCBI
maintains a database of repeated sequences which can
be obtained from their
ftp site.
This option changes the sequence which is selected
in the Target Sequence pop-up. Type the SeqID of the
sequence in the box, and the record viewer will be updated
to display that sequence.
The Style manager allows you to choose between different
formats in which to view the Graphical Display Format.
The graphical display is selected by choosing the Graphic
display format on the record viewer. Using the Style
Manager, you can also copy the style or modify it to
suit your needs.
As a default, Sequin is available as a stand-alone
program. However, the program can also be configured
to exchange information with the NCBI (GenBank) over
the Internet. The network-aware mode of Sequin is identical
to the stand-alone mode, but it contains some additional
useful options.
Sequin will only function in its network-aware mode
if the computer on which it resides has a direct Internet
connection. Electronic mail access to the Internet is
insufficient. In general, if you can install and use
a WWW-browser on your system, you should be able to
install and use network-aware Sequin. Check with your
system administrator or Internet provider if you are
uncertain as to whether you have direct Internet connectivity.
There are two ways to change Sequin into its network
aware mode. If you are still on the initial Welcome
to Sequin form, select Net Configure under the Misc
menu. If you have already worked on a Sequin submission,
and are looking at the record in the record viewer,
select the Net Configure option from the Misc menu.
After selecting the net configure option from either
location, click on "yes" when you are asked whether
to enable network usage and to run the configuration
program. You will see a new page, entitled "Internet
Setup Preferences" on which you must supply information
for the network configuration program. In most cases,
the default settings already filled out on this page
are sufficient. If your computer must connect to the
Internet through a firewall at your institution, you
may need to check the box marked "Outgoing Connections
Only." In you select this option, your client computer
will make only outgoing connections to the NCBI. Next,
fill out the page entitled "Dispatcher Internet Address."
Again, in most cases, the default dispatcher 130.14.25.211
is sufficient. On the next page, "Entrez Service Selection,"
select the Entrez service to receive the broadest class
of service. If, after clicking on Next Page, you receive
a message that Entrez service is not availabe, you may
need to go back one page and select a different dispatcher.
Once your configuration is complete, click on Accept
to save new new configuration. You must restart the
program for the changes to go into effect. If you have
problems setting up the network configuration, contact
info@ncbi.nlm.nih.gov
If you wish to change Sequin back to its stand-alone
mode, select Net Configure again from the Misc menu.
Click on "yes" when you are asked if you wish to disable
network usage. You must restart the program for the
changes to go into effect.
The network-aware mode of Sequin allows you to perform
a number of additional, important functions. These functions
all appear as additional menu items. A brief description
of these functions follows. Further descriptions are
available as indicated elsewhere in the help documentation.
[Top of Page]
Using Sequin in its network-aware mode, you can download
an existing GenBank record from Entrez using the GenBank
accession number or GI identification number (NID).
You can then use Sequin to make any necessary changes
to the record, and resubmit it to GenBank as a sequence
update. Instructions
for submitting sequence updates are presented under
the Welcome to Sequin Form. You can download any record
from Entrez and look at it in Sequin. However, you can
only perform a formal database update only those records
which you have previously submitted yourself.
In its network-aware mode, Sequin can perform a sequence
comparison of your sequence against the nucleotide and
protein databases at the NCBI. You can use the results
of these comparisons for annotation purposes. More information
about PowerBLAST searches is available
above.
Use Sequin in its network-aware mode to screen your
nucleotide sequence for vector or mitochondrial sequence
contaminants. You can then remove any unwanted sequences
in Sequin's Sequence Editor
before submitting the sequence to the databases.
The Vector Screen is explained
under the Search menu.
In its network-aware mode, Sequin can import the relevant
sections of a Medline record directly into a sequence
submission record. Rather than typing in the entire
citation, you can enter minimal information, such as
the Medline Unique Identifier (MUID), or Journal name,
volume, year, and pages. The
Medline lookup is explained in the section of the
documentation entitled Publications.
In its network-aware mode, Sequin can look up the taxonomic
lineage of an organism from the NCBI's taxonomy database.
This lookup is normally performed by the NCBI database
staff after the record has been submitted to GenBank.
If you look up the taxonomy before submitting the sequence,
you can make a note in the record of any disagreements.
The taxonomy lookup
is explained in the in the section of the documentation
covering Biological Source: Organism page: Lineage subpage.
The NCBI DeskTop displays the internal structure of
the record being viewed in Sequin. The
DeskTop is explained under the Misc menu.
This option allows you to perform a query against the
NCBI's
Entrez database.
If you have downloaded a sequence from Entrez, it will
have the Entrez neighbor buttons at the bottom of the
window. To see similar nucleotide sequences, select
Nucleotide in the Target pop-up menu, and then click
on the Neighbor button to the left to get the Entrez
list of related sequences. To see the Entrez records
for any publications or CDSs in the record, select the
Medline or Protein database, and click on the Lookup
button. Any associated Genome or Structure records can
also be viewed.
[Top of Page]
This option is only available if you are running Sequin
in its network-aware mode.
The NCBI DeskTop provides a view of the internal structure
of the Sequin record, the ASN.1. Its display resembles
a Venn diagram, and represents all the structures represented
in the ASN.1 data model.
In addition, a number of undocumented software tools
from the NCBI can be accessed from the DeskTop. These
tools are components of the NCBI portable software toolkit.
You can also customise these functions using the toolkit
with your own software tools. The toolkit and its documentation
are available from the NCBI by anonymous
FTP.
The DeskTop should only be used by very seasoned users.
At this time, we are not providing any documentation
for these specialised functions.
This menu allows you to enter features and descriptors
on the sequence.
The first six options, Genes and Named Regions, Coding
Regions and Transcripts, Structural RNAs, Bibliographic
and Comments, Sites and Bonds, and Remaining Features
refer to types of Features that can be added to the
sequence. Features are described in more detail in the
above section entitled Features.
The seventh option, Publications, allows you to add
a Publication Feature or Publication Descriptor to the
record. Publications are described in more detail in
the above section entitiled
Publications.
The eighth option, Descriptors, allows you to add Descriptors
to the record. Descriptors are described in more detail
in the section entitled Descriptors,
above.
The ninth option Generate Definition Line, will generate
an title for your sequence based on the information
provided in the record. This option will work for single
sequences as well as sets of sequences, and can handle
complex annotations with multiple features. The title
will follow GenBank conventions, but may be modified
by the database staff if it is not appropriate. The
title you enter here will replace any title you entered
elsewhere in the submission, for example, any title
which was attached to the nucleotide sequence. For a
description of definition lines, see
Nucleotide Definition line (title) , above.
Use this item to change the display font. From the
pop-up menus, choose the style and size of type. For
additional changes, mark the Bold, Italic, or Underline
check boxes. The default font is 10 point courier.
Enabled only in the Graphic view, it shows what the
various kinds of features used in the picture look like,
i.e., what colors and styles and fonts each one uses.
[Top of Page]
This editor allows you to modify the nucleotide or
amino acid sequences in your entry. For example, you
can add or remove nucleotides, or you can add or remove
CDS (coding sequence) features from the entry. Using
the Sequence Alignment Editor if you have an aligned
set of sequences, you can add new sequences, replace
old sequences, and propagate features from one sequence
to others.
Even though the Sequence Editor does allow you to undo
changes you make to the sequence, we strongly suggest
that you save a copy of the entry before launching the
Sequence Editor so that you can revert to it if necessary.
The sequence which appears in the editor is dependent
on the sequence(s) selected in the Target Sequence pop-up
menu. There are two ways to lauch the sequence editor
for nucleotide sequences. First, you can double click
within sequence in any display format of the record
viewer. A window containing the DNA sequence will appear.
Second, in the record viewer, select the sequence you
wish to edit in the Target Sequence pop-up menu. Click
on Edit Sequence under the Edit menu. You can launch
the editor for protein sequences in two ways also. If
you select the protein sequence in the Target Sequence
pop-up menu, double click within the protein sequence.
A window containing the protein sequence will appear.
If you select a nucleotide sequence in the Target Sequence
pop-up menu, double click within the CDS (coding sequence)
feature to launch the Coding Region feature form (see
Features in the Sequin help
documentation). Click on "Launch Product Viewer" to
start the sequence editor. Both methods of accessing
the protein sequence editor will result in the same
display window.
The cursor can be moved with the mouse or the arrow
keys. The display window will change to show the position
of the cursor. The sequence location of the first residue
on each line is indicated on the left side of the window.
The cursor location, or the range of sequences selected
by the mouse, is shown in the upper left corner of the
window. If you want to move the cursor to a specific
location, type the number in the box on the top left
of the sequence editor window, and hit the Go to button.
If you want to look at a specific sequence, but not
move the cursor to it, type the number in the upper
right box of the window and hit the Look at button.
Select a piece of sequence by highlighting it with
the mouse. To select the entire sequence, click on a
sequence location number on the left side of the window.
Any sequence that is highlighted in the Sequence Editor
will show up as a box on the sequence when it is viewed
in the Graphic Display Format.
One way to insert and delete residues is with the mouse.
Move the cursor to the appropriate location and type.
Text will be inserted to the left of the cursor. Delete
sequence with the backspace or delete key. Text will
be deleted to the left of the cursor. To delete a block
of sequence, highlight it with the mouse and use the
delete or backspace key.
Another way to insert and delete residues is with options
under the Edit menu of the Sequence Editor. Use Cut
to remove, or Copy to copy, highlighted residues. Paste
these sequences anywhere. Use Clear to permanently remove
highlighted residues.
To save changes you have made to the sequence, press
the Accept button at the bottom of the Sequence Editor
display window. If you do not wish to save the changes,
press the Cancel button at the bottom of the Sequence
Editor display window. Selecting either Accept or Cancel
will quit the Sequence Editor and return you to the
record viewer. Please note that any changes you make
will not become a permanent part of the Sequin record
until you Save the record in the record viewer. The
Save button at the bottom of the Sequence Editor display
window is used only to save a CDS feature.
[Top of Page]
The default sequence displayed for a nucleotide sequence
is the coding strand. If you wish to see the complement
of this sequence, that is, if you wish to see the double
stranded version of this sequence, select Complement
under the Sequence Editor View menu. You can also elect
to see the translation of the top stand. Select Reading
Frames + under the Sequence Editor View menu to see
the three phase translation of the upper (coding) nucleotide
strand. Select Reading Frames - under the Sequence Editor
View menu to see the three phase translation of the
lower (noncoding) nucleotide strand. Methionine residues
are colored. Complement, Reading Frames +, and Reading
Frames - can be selected simultaneously or individually.
Only the top nucleotide strand can be edited. Any changes
made in this strand are reflected in the Complement
as well as in the Reading Frames.
A powerful feature of the Sequence editor is that it
allows you to make new CDS (coding sequence) features
on the nucleotide sequence. To make a new CDS feature,
select the residues, and choose Make CDS +, to make
a CDS from the upper strand, or Make CDS -, to make
a CDS from the lower strand, under the Sequence Editor
Features menu. The CDS will be indicated by a colored
bar. The bar will appear under the highlighted sequence.
The CDS can be selected by clicking either on it or
on the word CDS in the left margin. To remove a CDS,
select it and click on Clear under the Sequence Editor
Edit menu. Be sure to select the CDS only; if the nucleotide
sequence is highlighted, it will be deleted as well.
You can also have Sequin find coding sequences for
you by using the ORF Finder, located under the Search
menu of the record viewer. Click on ORF Finder to find
the ORFs (open reading frames) in your sequence. The
ORF Finder is described in more detail
above. In the ORF Finder, click on the ORF you
want to add the the sequence. This ORF will be highlighted
on the sequence when it is viewed in the sequence editor.
You can then make the sequence into a CDS by following
the above instructions.
Translate the coding sequence by clicking on the Translate
button on the bottom of the Sequence Editor window.
The translation will appear under the CDS bar. You can
change the length of the CDS by grabbing the bar at
one end and shrinking or expanding it. Move the CDS
by grabbing the bar in the middle and moving it. The
translation will move along with the CDS.
Until you save a feature, it is considered a "virtual"
feature and is not added to the record. To see the features
which are an integral part of the record, click on the
Show feat. button at the bottom of the Sequence Editor
window. To hide the features again, click on the Hide
feat. button. If you have not changed the default colors,
the saved features will be pink or black, and the "virtual"
features will be green. To save a new CDS and make it
an integral part of the record, click on the Save button
at the bottom of the Sequence Editor window. This action
will launch the Coding Region (CDS)
feature form. At minimum, enter the name of the protein
on the Protein subpage of the Coding Region page. For
more detailed instructions, see the CDS feature, above.
After you click Accept on the Coding Region form, Sequin
will accept the CDS as a new feature, and the record
viewer and other windows will be brought up to date.
The color of the CDS will change to pink. A graphical
representation of features can be seen by selecting
the Graphic Display Format in the record viewer.
Saved CDS features can also be edited. You can alter
the length or location of a saved CDS as described above.
However, a saved CDS cannot be removed in the Sequence
Editor window. To remove a saved feature, go the the
Graphic Display Format of the record viewer, select
the CDS, and choose Clear under the Edit window.
When you add or remove nucleotide sequence in a region
within a CDS, you can choose whether the CDS should
be interrupted by these changes. On the main Sequin
window, select Split feature mode to have the CDS interrupted
by the inserted or deleted sequence. Select Merge feature
mode to have the CDS incorporate the changed sequence.
[Top of Page]
Sequin allows you to work with aligned sets of closely
related nucleotide sequences which are part of a population,
phylogenetic, or mutaion study. If the sequences are
imported in a pre-aligned format, such as PHYLIP, Sequin
uses this alignment. If the sequences are imported individually
in FASTA format, Sequin generates its own alignment.
You can view the aligned sequences in the Sequence
Alignment Editor. In the record viewer, select All Sequences
in the Target Sequences menu, and select the Alignment
Display Format. Highlight the alignment by clicking
inside of the box surrounding the sequence bars. Then
select Edit alignment from the Edit menu. The aligned
sequences can be viewed in a number of different formats.
See instructions for the Sequence Editor Alignment menu,
below.
If you imported a set of nucleotide sequences, you
may want to add a CDS (coding sequence) feature to one
or more of the sequences. You must first add the CDS
feature to a single sequence (see Editing a CDS,
above ). In order to access the Sequence Editor
for a single sequence, double click on the name of that
sequence in the Alignment view. You can then propagate
the feature to other sequences (see propagate under
the Features menu, below
).
Using the sequence editor, you can also import new
sequences into Sequin and align them with the pre-existing
sequences. To align a new sequence with a single sequence
in Sequin, choose Align with
under the Sequence Editor Edit menu. To align a new
sequence with a set of pre-aligned sequences, choose
Align under the Sequence Editor
Alignment menu. You can also propagate
features between the aligned sequences.
Undoes the previous action.
Removes the highlighted sequence. This sequence can
be pasted elsewhere.
Pastes a cut or copied sequence to the right of the
cursor.
Copies the highlighted sequence. This sequence can
be pasted elsewhere.
Removes the selected sequence.
Refreshes window by reloading the data. Note that this
option does not undo any editing.
[Top of Page]
The Find command allows you to find DNA or amino acid
sequence patterns in your sequence. The search is case
insensitive. To find an exact match to a DNA sequence
pattern, type the pattern in the box. You can also specify
non-exact patterns. To find the reverse complement of
the pattern, click on the box. For example,
TCAGGGC finds the sequence TCAGGGC
[TCA]CAGGGC finds T or C or A followed by CAGGGC
NCAGGGC finds T or C or G or A followed by CAGGGC
TCA(3)GC finds the sequence TCAGGGC
TCA(1:3)GC finds the sequences TCAGC, TCAGGC, and TCAGGGC
TCA(1:3)NC finds the sequence TCA, followed by 1-3
occurrences of G,A,T,or C, followed by C, i.e., TCATC
or TCATTC or TCAATGC
To find an exact match to an amino acid sequence pattern,
type that sequence in the box, and click on "translate
sequence". Sequin will look for all occurences of that
pattern in all three plus strand open reading frames.
The open reading frames will be shown, and the DNA sequence
encoding that protein sequence will be highlighted.
You can also specify non-exact patterns. For example,
CDLPEYC finds the sequence CDLPEYC
[CRQ]DLPEYC finds C or R or Q followed by DLPEYC
XDLPEYC finds any amino acid followed by DLPEYC
CDL(3)EYC finds the sequence CDLEEEYC
CDL(1:3)PE finds the sequences CDLPE, CDLPPE, and CDLPPPE
CDL(1:3)XE finds the sequence CDL, followed by 1-3
occurrences of any amino acid, followed by E, i.e.,
CDLAAE, CDLRSE, or CDLAPQE
Find the previous occurrence of a pattern.
Find the next occurrence of a pattern.
This option will remove N's from the ends of a sequence.
If your sequence is part of an alignment, the alignment
will be automatically recalculated after the N's are
removed.
This option allows you to import an additional sequence
into the Sequence Editor, and align it with the sequence
that is already there. This new sequence can be used
as a reference, or it can replace an existing sequence
in the record.
To import the new sequence into Sequin, click on Align
with under the sequence editor Edit menu. Choose FASTA
or ASN.1, depending on whether the sequence is in FASTA
or ASN.1 format. (Your sequence is probably in FASTA
format, described
above, unless you have downloaded it from Entrez
and specified that it should be saved as ASN.1 format.)
The sequence must be closely related to the sequence
already in the record. For example, the sequence could
be slightly longer or shorter than the existing sequence,
or it could have single base bair changes. The new sequence
will be aligned with the existing sequence using a global
alignment algorithm. The alignment will appear in a
new window. If you wish, you can propagate features,
such as a CDS, from the new sequence to the original
sequence. Instructions on propogating features are found
below.
To replace the old sequence with the new sequence,
go back to the Sequence Editor Edit menu and click on
Replace. You will be asked if you really want to replace
the old sequence by the new one. If you do, click on
proceed, and the replacement will occur. If you do not
wish to replace the old sequence with the new one, click
on Dismiss. To remove the new sequence, click Dismiss
in the alignment editor window. You will be returned
to the sequence editor view of the original sequence.
You may also wish to merge two sequences, if, for example,
you have two sequence files, one which contains the
coding sequence of a gene, and another which contains
the 5' UTR of the gene along with the first 50 nucleotides
of coding sequence. In this case, go back to the Sequence
Editor Edit menu and select Merge. You will be given
the choice of merging the 5p (5 prime) or 3p (3 prime)
end of the new sequence with the old.
[Top of Page]
Allows you to change how the Sequence Identifiers are
displayed. Normally, sequence identifiers are only displayed
for aligned sequences. If the sequences have been downloaded
from Entrez, and have different names in their definition
lines, you can change which name you see. You can view
each sequence labelled by the following names: FASTA
short, FASTA long, Locus, Accession, or Report.
Shows the complement of the submitted strand underneath
the original.
Shows the indicated phase translation of the selected
coding sequence. You can select any or all of the six
reading frames.
Allows you to change the style of the display, including
the colors, font type, and font size.
Allows you to choose between three styles in which
to view the coding sequence translation. The default
style is to see all amino acids. If you select the ***
option, Sequin will display all methionine residue as
M and all stop codons as *. If you select the orf option,
Sequin will show all open reading frames by connecting
the M and * in a single reading frame with a ~.
Allows you to change the color by which sequence conservation
is displayed in an alignment. It is used in conjunction
with the Pretty mode in the Sequence Alignment Editor.
The export function allows you to save the sequence
to a file. Text saves the sequence as a text file. FASTA
saves the sequence in FASTA format. Select the range
of sequence you want to export by typing in the location
in the dialog box.
the Features menu changes depending whether you are
viewing a single sequence or a set of aligned sequences.
If you are viewing a single sequence, the menu contains
a long list of all features that can be annotated on
a sequence. These features are the same as those that
are accessible through the main Sequin Annotate menu.
You can annotate features either in the Annotate menu
or in the Sequence Editor. If you annotate them in the
Annotate menu, you must provide the nucleotide sequence
location of the feature. However, if you add features
from the Sequence Editor, you do not need to know their
nucleotide coordinates. Simply highlight the sequence
which the feature covers, and the location of the sequence
will be automatically entered in the feature location
box. Additional explanations of how to annotate features
are provided in the section on
Features.
If you are viewing a set of aligned sequences, you
will see the following three menu choices:
Create a coding sequence from the upper (+) strand
of the selected sequence. For more details on working
with a CDS, see
Editing a CDS (coding sequence), above.
[Top of Page]
Create a coding sequence from the lower (-) strand
of the selected sequence. For more details on working
with a CDS, see
Editing a CDS (coding sequence), above.
The propagate command propagates features, such as
gene, mRNA, or CDS, from one sequence in the set to
one or more other sequence(s). For example, if one nucleotide
sequence in the alignment contains a CDS feature, you
can propagate a similar CDS, over the same interval,
to the other nucleotide sequences in the set. The exact
amino acid sequence of the new CDS will depend on the
nucleotide sequence of the individual nucleotide sequences.
Select the feature you want to propagate by clicking
on the feature in the Select source Features box. You
can also select the feature by clicking on the sequence
in the Select source sequences menu which contains that
feature. Next, select the target sequences, the sequences
you want the feature to be propagated to. Click while
holding down the Control key on the keyboard to select
multiple targets. Using the radio buttons, select whether
you want to split gaps or merge gaps. If you select
split gaps, the feature will be split around any gaps
in the sequence, resulting in multiple features. If
you select merge gaps, the feature will be propagated
across the gap, resulting in a single feature. To complete
the task, select Propagate.
Theses menu choices are only available when an alignment
is being edited. Alignments can be generated when a
set of sequences from a phylogenetic, population, or
mutation study is submitted. They can also be made by
importing additional reference sequences into Sequin
with the Align with option
under the Sequence Editor Edit menu.
Select all nucleotide sequences.
Select all coding sequences.
Select which of the aligned sequences should be the
master sequence. Click on the desired sequence in the
sequence editor, then choose Select master to change
the display. By default, the master sequence is the
first sequence listed. The Sequence Identifier of the
master sequence is indicated in color.
Changes the way the sequences are displayed. When Show
all is selected (that is, Show differences is visible),
the entire sequence of each entry is displayed. When
Show differences is selected, the entire sequence of
the master sequence is shown. The sequences of the aligned
entries are shown as dots where they are identical to
the master sequence, and letters where they are different.
This option shows the result of a dot matrix plot between
two selected sequences in the alignment. It is under
development.
Note: This item is under development. Save your Sequin
record before using any of the features under this selection.
Align allows you to carry out two main functions. You
can recalculate the sequence alignment by choosing among
three different algorithms. You can also import new
sequences into the record and align these sequences
with each other as well as with the existing sequences.
You can use one of three algorithms to recalculate
the sequence aligment. Sim3 is the least stringent method
and is used for sequences which are highly similar.
Sim is the most stringent method and is used for sequences
which are less similar. Sim2 is in the middle. First,
in the Sequence Alignment Editor, select the sequences
you wish to align. Select all sequences by choosing
Select all under the Alignment menu, or select a subset
by holding down the control button as you click on them.
Next, select Align under the Alignment menu, and click
on the desired algorithm. The alignment will open in
a new window. Select Accept to incorporate the new alignment.
If the sequences which you wish to import into the
alignment are already in files on your computer, they
must be in FASTA or ASN1 seq-entry format. Sequences
can be obtained in ASN1 format from Entrez. Alternatively,
if you are running Sequin in its
network-aware mode, the program can look up the
sequence directly from Entrez. You must have a file
on your computer which contains a list of accession
numbers or GI numbers (a type of accession number).
This list should be in the format of
gi|#1
gi|#2
gi|#3
for GI numbers or
gb|#1
gb|#2
gb|#3
for accession numbers.
You can choose the algorithm to use when recomputing
the alignment of the imported sequences. Use Sim2 in
most cases. Use Sim3 if the nucleotide sequences are
highly similar.
After you have imported the sequences into Sequin,
you can propagate features between the old and the new
sequences (see above ).
[Top of Page]
This selection provides a brief report about the alignments.
It provides a matrix with the number of mismatches and
number of gaps between the sequences in the alignment.
Moves the cursor to the indicated location.
Moves the window to the indicated location without
moving the cursor.
In merge mode, any new sequence which is entered into
a region spanned by an existing feature becomes part
of that feature. For example, if you enter new sequence
in the middle of a CDS, that sequence will be translated
as part of the CDS. In split mode, the new sequence
interrupts the feature. For example, if you enter new
sequence in the middle of a CDS, the CDS will be interrupted
by that sequence (see the location of the CDS in the
record viewer).
This option is only visible in the Alignment Editor.
It allows you to change the format in which the alignment
is displayed. Pretty mode shades the alignment with
different colors for different ranges of sequence conservation.
The colors can be changed with the Color option under
the Sequence Editor View menu.
This box toggles between hiding and showing the features
on a sequence. To hide the features, click on the box
when it is called Hide feat. To show features, click
on the box when it is called Show feat.
Translates the selected CDS. For more details on working
with a CDS, see
Editing a CDS (coding sequence), above.
Saves the selected CDS as part of the entry. For more
details on working with a CDS, see
Editing a CDS (coding sequence), above.
Refreshes the Sequence Editor window by reloading the
data. This option does not undo any editing.
Closes the Sequence Editor after saving all of the
changes made to sequences and features.
Closes the Sequence Editor without saving any changes
made to sequences or features.
[Top of Page]
Credits: Tyra Wolfsberg
Comments and questions to: info@ncbi.nlm.nih.gov
(NCBI) or http://www.ebi.ac.uk/support/
(EBI)
|