This file lists changes that have been made to the Sequin program.
Where appropriate, there are also links to the relevant section of the Sequin help documentation.
Version 2.80--January 27, 1999
- Setting up Sequin to communicate over the network is now much easier.
Sequin can function as either a stand-alone or network-aware program.
The stand-alone version is all that is needed to perform most sequence
submissions. In its network aware mode, Sequin can also communicate
with the NCBI to download sequences from Entrez, perform Power-BLAST
searches and Entrez queries, and screen for the presence of vector
sequences or repeat elements.
The Network Configuration program is located under the Misc menu, both
on the initial Welcome to Sequin page and in the record viewer. Most
users can select a "Normal" connection and click on Accept to begin the
configuration. If you are behind a firewall, you may need to contact
your system administrator in order to fill in the Proxy and Port
fields. Users outside the United States or with a bad Internet
connection may need to increase the Timeout, the length of time for
which Sequin will wait for a response from the network.
- Asnload files are no longer included in the Sequin distribution
Due to improvements in NCBI services, Asnload files are no longer
necessary on any platform for recent NCBI software. Therefore, when you download and
install the new version of Sequin, the Asnload folder will no longer be present.
Version 2.70--September 14, 1998
- This version is capable of editing complete bacterial chromosomes
or large eukaryotic chromosomal segments in a single record.
The generation of reports (i.e., GenBank and Graphic view) and
validation are now much faster.
- Sequin can now annotate features by reading in a tab-delimited
table. The table specifies the location and type of feature, and
Sequin processes the feature intervals and translates any CDSs. The
table is read in the record viewer (after the sequence has been
imported) using the File-->Open menu. The table must follow a defined
format. The first line starts with >Feature, a space, and then the
Sequence ID of the sequence you are annotating. In the example below,
eIF4E is the Sequence ID. The table is composed of five columns:
start, stop, feature key, qualifier key, and qualifier value. The
columns are separated by tabs. The first row has start, stop, and
feature key. Additional feature intervals just have start and stop.
The qualifiers follow on lines starting with three tabs.
For example, a table which looks like this:
>Features eIF4E
80 2881 gene
gene eIF4E
201 224 CDS
1550 1920
1986 2085
2317 2404
2466 2629
product eukaryotic initiation factor 4E-II
1402 1458 CDS
1550 1920
1986 2085
2317 2404
2466 2629
product eukaryotic initiation factor 4E-I
note encoded by two messenger RNAs
80 224 mRNA
1550 1920
1986 2085
2317 2404
2466 2881
product eukaryotic initiation factor 4E-II
80 224 mRNA
892 1458
1550 1920
1986 2085
2317 2404
2466 2881
product eukaryotic initiation factor 4E-I
80 224 mRNA
1129 1458
1550 1920
1986 2085
2317 2404
2466 2881
product eukaryotic initiation factor 4E-I
will result in a GenBank flatfile which contains this:
mRNA join(80..224,1129..1458,1550..1920,1986..2085,2317..2404,
2466..2881)
/gene="eIF4E"
/product="eukaryotic initiation factor 4E-I"
mRNA join(80..224,892..1458,1550..1920,1986..2085,2317..2404,
2466..2881)
/gene="eIF4E"
/product="eukaryotic initiation factor 4E-I"
mRNA join(80..224,1550..1920,1986..2085,2317..2404,2466..2881)
/gene="eIF4E"
/product="eukaryotic initiation factor 4E-II"
gene 80..2881
/gene="eIF4E"
CDS join(201..224,1550..1920,1986..2085,2317..2404,2466..2629)
/gene="eIF4E"
/codon_start=1
/product="eukaryotic initiation factor 4E-II"
/translation="MVVLETEKTSAPSTEQGRPEPPTSAAAPAEAKDVKPKEDPQETG
EPAGNTATTTAPAGDDAVRTEHLYKHPLMNVWTLWYLENDRSKSWEDMQNEITSFDTV
EDFWSLYNHIKPPSEIKLGSDYSLFKKNIRPMWEDAANKQGGRWVITLNKSSKTDLDN
LWLDVLLCLIGEAFDHSDQICGAVINIRGKSNKISIWTADGNNEEAALEIGHKLRDAL
RLGRNNSLQYQLHKDTMVKQGSNVKSIYTL"
CDS join(1402..1458,1550..1920,1986..2085,2317..2404,
2466..2629)
/gene="eIF4E"
/note="encoded by two messenger RNAs"
/codon_start=1
/product="eukaryotic initiation factor 4E-I"
/translation="MQSDFHRMKNFANPKSMFKTSAPSTEQGRPEPPTSAAAPAEAKD
VKPKEDPQETGEPAGNTATTTAPAGDDAVRTEHLYKHPLMNVWTLWYLENDRSKSWED
MQNEITSFDTVEDFWSLYNHIKPPSEIKLGSDYSLFKKNIRPMWEDAANKQGGRWVIT
LNKSSKTDLDNLWLDVLLCLIGEAFDHSDQICGAVINIRGKSNKISIWTADGNNEEAA
LEIGHKLRDALRLGRNNSLQYQLHKDTMVKQGSNVKSIYTL"
Note that if the gene feature spans the intervals of the CDS and mRNA
features for that gene, you don't need to include gene "qualifiers" in
those features, since they will be picked up by overlap.
Features which are on the complementary strand are indicated by reversing
the interval locations. For example, the table:
>Features dna2
2710 2639 tRNA
note codon recognised: GAA
product tRNA-Glu
anticodon (pos:2675..2677, aa:Glu)
will result in a GenBank flatfile containing:
tRNA complement(2639..2710)
/note="codon recognised: GAA"
/product="tRNA-Glu"
/anticodon=(pos:2675..2677, aa:Glu)
Version 2.60--June 2, 1998
- You can now open a FASTA-formatted DNA sequence file in Sequin
without first creating a Sequin record. On the Welcome to Sequin Form,
click on "Read Existing Record" to read in your sequence and open it in
the record viewer. Or, if you are already viewing a record in Sequin,
choose File-->Open to open a FASTA-formatted DNA sequence. However,
although the sequence will be displayed in Sequin and can be analysed
with tools such as PowerBLAST, Vector Screen, or ORF Finder, it should
not be submitted, because it does not have the appropriate annotations
or the required contact information to make it a valid submission.
- A variety of minor bugs have been fixed (affecting all platforms).
Version 2.45--March 3, 1998
- Easier sequence annotations
You can now use the Sequence Editor Feature menu, as well as the main Sequin
Annotate menu, to annotate features on the sequence. The features listed are
identical, and the instructions for adding them are the same, with one exception.
If you annotate them in the Annotate menu, you must provide the nucleotide
sequence location of the feature. However, if you add features from the Sequence
Editor, you do not need to enter the nucleotide coordinates manually. Simply
highlight the sequence which the feature covers, and the location of the sequence
will be automatically entered in the feature location box.
- New PowerBLAST features
PowerBLAST capabilities have been enhanced. When you do a PowerBLAST from
within Sequin, you can limit a search either for or against an organism or
taxonomic group. Under Organism Filter, click on "Restrict to" to limit your
search to a particular organism. Or, conversely, click on "Filter against"
to search against all organisms except one. Type the scientific name of the
organism (e.g., Homo Sapiens) or taxonomic group (e.g., Mammalia) in the "Name"
box. After you do a PowerBLAST search, additional controls will be added to
the bottom of the record viewer window. These controls allow you to retrieve
the PowerBLAST hits from Entrez, and then look for Entrez neighbors. Use the
alignment pop-up to select the type of alignment (search) that was performed.
If multiple blast search types were run in one PowerBLAST search, this allows
you to get one type at a time. Then click on the Retrieve button to retrieve
the records in a document summary window, where you can view Medline, Protein,
Nucleotide, Structure, and Genome neighbors of the sequence(s). Click on the
Refine button to open a query refinement window in which you can further refine
the PowerBLAST hits by selecting other Entrez terms, such as Author name to
view sequences belonging to a specific author.
- Replacing or updating your sequence
We had previously explained that you can now replace or merge the sequence
in the record with a new sequence without going into the Sequence Editor.
This option is available in the Update Sequence submenu of the main Sequin
Edit menu. In addition to being able import a sequence in FASTA format (Read
FASTA File), import a sequence record in ASN.1 format (Read Sequence Record),
or download a sequence record from Entrez (Download Accession), you can now
import a sequence from a Sequin PowerBLAST alignment (Selected Alignment).
Note that in all cases, both the target and the imported sequence must be
nucleotide sequences. The alignment between your original sequence and the
imported sequence can be viewed in a separate window. You can then choose
to merge the 5' or 3' end of the imported sequence with the target sequence
in the record, or replace the target sequence with the imported sequence.
The features on the imported sequence will be automatically copied to the
original sequence. You can also choose only to propagate features from the
imported sequence record to the target.
- Contact information for future submissions
The contact, authors, and affiliation information you provide on the Submitting
Authors form can now be saved as a block and used for subsequent submissions.
For your first Sequin submission, fill in the requested information. Then,
in the record viewer, click on "Edit Submitter Info" under the Edit menu,
and then on Export Submitter Info under the File menu in the resulting Submission
Instructions form. For subsequent Sequin submissions, click on Import Submitter
Info on the first page of the Submitting Authors form. You must still fill
in the manuscript title on the this page, though.
- Formatting segmented population/phylogenetic sets
Sequin can now read segmented sets of sequences which are parts of phylogenetic,
population, or mutation studies. A segmented set is a colllection of non-overlapping
sequences which cover a specified genetic region, such as a set of exons along
with fragments of flanking introns. The sequences must be in FASTA or FASTA+GAP
format. Each segment should have its own sequence identifier (the term immediately
following the ">", but organism name and source modifiers should only be indicated
for the first segment from each sequence. Square brackets are used to delimit
the members of a set. For example:
[ >bioseq1part1 [org=Mus musculus] [strain=BALB/c]
CAGATGGCTCC
>bioseq1part2 ATAATGACAGCTTCATAATGGCAGTGGGTGAGCCCCTGGTGCACATCAG
]
[
>bioseq2part1 [org=Rattus norvegicus] [strain=Sprague-Dawley]
CAGTCGGCTCC
>bioseq2part2
ATAATGATGTCTTCATAATGGCAGAAAGTGAGCCCCTGGTGCACATCAG
]
- Creating automatic definition lines
Sequin can now create definition lines (sequence titles) automatically based
on information provided in the record. This option works for single sequences
as well as sets of sequences, and can handle complex annotations with multiple
features. The definition lines will follow standard GenBank conventions. Use
the function "Generate Definition Line" under the Sequin Annotate menu.
- Encoding new information in definition line
If you are submitting the sequence for an organism which is not present in
the NCBI taxonomy database, you can indicate the lineage of the organism on
the first line (definition line) of your FASTA-formatted nucleotide sequence.
Use the modifier [lineage=lineage] on the line where other modifiers are indicated.
For example,
>dna1 [org=Neworganism] [strain=A] [lineage=Newlineage]
GGGGGGGGGGAAAAAAAAAAAAAAATTTTTTTTTTTTTTTTCCCCCCCCCCCCCGGGGGGGGGGG
AAAAAAATTTTTTTTTTTTTCCCCCCCCCCCCCC
-
For information about the organisms presently in GenBank, see the NCBI
taxonomy browser at http://www.ncbi.nlm.nih.gov/Taxonomy/
Additional source information can also be encoded directly in the definition
line. You can now indicate [location={genomic,chloroplast,kinetoplast,mitochondrion,macronuclear,
extrachromosomal,plasmid,transposon,insertion sequence,cyanelle,proviral,
virion}] and [molecule={dna,rna}] . In each case you can
pick one item from the list in {}, so a sample definition line could be:
-
>dna1 [org=Homo sapiens] [location=genomic] [molecule=dna]
- Direct submission information
The DIRECT SUBMISSION reference for your new submission will now appear as
it will once the record is released to the public. In the past, this information
was stored by Sequin, but not displayed. This citation lists the authors who
should recieve scientific credit for the sequence, and may have as many authors
as you see fit. It should have, at the very least, one author. Author names
are initially entered on the Submitting Authors form. You can modify the list
by double clicking on the reference.
- Minor changes
- Reference publication types now include Proceedings (meetings) and Proceedings
Chapter (meeting abstracts).
- When you highlight a range of sequence in the Sequence Editor, the selected
sequence is shown as a box in the Graphic view of the record.
- An option called "Select Target" was added to the Sequin Search menu.
This option changes the sequence which is selected in the Target Sequence
pop-up.
Version 2.28--September 19, 1997
- You can now replace or merge the sequence in the record with a new sequence
without going into the Sequence Editor. In the main Sequin window, choose one
of the three items in the
Edit-->Update Sequence submenu. You can import a
sequence in FASTA format (Read FASTA File), a sequence record in ASN.1 format
(Read Sequence Record), or, if you are running Sequin in its Network Aware
mode, download a sequence record from Entrez (Download Accession).
The alignment
between your original sequence and the imported sequence can be viewed in a separate
window. You can then choose to merge the 5' or 3' end of the imported
sequence with the target sequence in the record, or replace the target
sequence with the imported sequence. The features on the imported sequence
will be automatically copied to the original sequence. You can also choose
only to propagate features from the imported sequence record to the target.
- If you are submitting an aligned set of sequences, and one or more of the sequences
is already present in the GenBank/EMBL/DDBJ database, you can mark that
sequence(s) so that it does not get a new accession number. Instead of
providing the sequence(s) with a new Sequence Identifier, add 'acc' to the
existing accession or gi number. For instance, use the identifier accU12345, where
U12345 is the existing accession number. The sequence does not need a
title since it is not being resubmitted to the database. Thus, an example of a nucleotide definition line would be:
>accU54469
- You can now encode a a comment in a protein
definition line, and the text of the comment will turn into a /note on
the CDS feature. For example, if the definition line for the protein sequence
is
>aa1 [gene=eIF4E] [prot=eukaryotic initiation factor 4E-I]
[comment=alternative splice product]
Drosophila melanogaster eukaryotic initiation factor 4E-I, complete sequence
the corresponding CDS feature will have the following fields:
/gene="eIF4E"
/note="alternative splice product"
/product="eukaryotic initiation factor 4E-I"
- PowerBLAST capabilities have been
enhanced. From within Sequin, you can perform
blastn or tblastn searches
of the sequence(s) in the record against many
NCBI supported nucleotide databases, and blastp
or blastx searches against
protein databases. PowerBLAST can now also handle
large sequences. For additional information,
see the BLAST home page at http://www.ncbi.nlm.nih.gov/BLAST/.
- A variety of minor bugs have been fixed (affecting all platforms).
Version 2.20--July 29, 1997
- In the Sequence Editor, the "Find" command
under the Edit menu now searches the displayed nucleotide sequence for amino acid
as well as nucleotide sequence patterns. If you type in an amino acid sequence,
Sequin will search for that sequence in a three-frame translation of the nucleotide sequence.
For example,
- CDLPEYC finds DNA sequences encoding CDLPEYC
- [CRQ]DLPEYC finds DNA sequences encoding C or R or Q followed by DLPEYC
- XDLPEYC finds DNA sequences encoding any amino acid followed by DLPEYC
- CDL(3)EYC finds DNA sequences encoding CDLEEEYC
- CDL(1:3)PE finds DNA sequences encoding CDLPE, CDLPPE, and CDLPPPE
- CDL(1:3)XE finds DNA sequences encoding CDL, followed by 1-3 occurrences of
any amino acid, followed by E, i.e., CDLAAE, CDLRSE, or CDLAPQE
- For gene features on a segmented set, the location is now specified by an
"order" rather than a "join". In the past, a gene feature on a segmented set
was indicated on the last record in the set as follows:
gene join(AF000001:1..2000,1..5388)
/gene="testbar"
In this version of Sequin, the location of the gene feature is shown on the last
record as follows:
gene order(AF000001:1..2000,1..5388)
/gene="testbar"
- The order in which the gene, CDS, and other features are displayed has been
changed. In the past, the order of these features, if they covered the same
sequence interval, was random. Now, features with the same interval are always
displayed in the following order:
gene
CDS
any_feature
Please note that, as always, the order in which features are displayed depends on their left-most sequence position. Thus, the source feature, whose left-most position is "1", is always the first feature. For example,
source 1..4495
/organism="Homo sapiens"
gene 86..4339
/gene="ABC1"
CDS 86..4339
/gene="ABC1"
- Under the Sequin Search menu, the command previously called "Find" has been renamed "Find ASN.1". This command allows you to find and replace strings of text in your submission.
- Under the Sequin Search menu, a new command called "Find FlatFile" has been added. This command allows you to find strings of text in your submission.
- Under the Sequin Annotate menu, an additional choice called "Remaining Features" has been added. Now any feature which is legal under the DDBJ/EMBL/GenBank feature table can be added to a submission.
- The default HUP date is now one year from the current date. The HUP (hold until published date) is the date on which you specify that the sequence can be released to the public.
- In the Sequence Editor, the "Label" option is now available under the View menu. This option allows you to choose how sequence names are displayed in an alignment.
- A number of bugs in the Sequence Editor have been fixed.
- A number of bugs specific to the DEC Alpha OSF1 version of Sequin have been fixed.
Version 2.14--July 5, 1997
- Network Entrez, an NCBI tool for accessing bibliographic, sequence, and
structure records is now fully integrated into Sequin. As before, users
can download sequences from Entrez to view or to edit and resubmit to the
databases. Now, users can also view any type of record on demand from within
Sequin. Users are reminded that they can only update their own records this
way.
- PowerBLAST, a version of the client for the popular BLAST software
for sequence comparisons, is now available from within Sequin. Users
can compare their sequence to sequences in the nucleotide and protein
databases and view the from results within Sequin.
- A new algorithm is now used to calculate global sequence alignments.
You can have Sequin calculate and display the alignment between a
sequence in the record and another sequence in a file. In the sequence
editor, select the option "Align with" under the Edit menu.
- Sequin will now recognise sequence alignments which have been saved in one of two
NEXUS formats, NEXUS
Interleaved and NEXUS Contiguous formats.
- Records can now be viewed with two new Display Formats, Summary and Sequence.
Summary format shows the range of any sequence alignments in the record.
Sequence format shows the sequence(s) in the record along with any
associated features.
- The location of certain menu items has been changed. All features
and descriptors are now accessed from the Annotate menu.
- A variety of minor bugs have been fixed (affecting all platforms).
Version 1.94--March 12, 1997
- We have increased support for phylogenetic/population study
submissions. The supporting documentation has been extended and now
includes instructions on how to propagate features (such as CDS or
rRNA) through alignments.
- We have added pattern matching for nucleotide sequences using regular expressions.
This is used via the "Find" command in the Sequence Editor Edit menu.
For example,
- TCAGGGC finds the sequence TCAGGGC
- [TCA]CAGGGC finds T or C or A followed by CAGGGC
- NCAGGGC finds T or C or G or A followed by CAGGGC
- TCA(3)GC finds the sequence TCAGGGC
- TCA(1:3)GC finds the sequences TCAGC, TCAGGC, and TCAGGGC
- TCA(1:3)NC finds the sequence TCA, followed by 1-3 occurrences of
G,A,T,or C, followed by C, i.e., TCATC or TCATTC or TCAATGC
- You must now enter the scientific, not the common, organism name when
preparing a new submission. Sequin, as well as the GenBank/EMBL/DDBJ
record, will still list both scientific and common names, however.
- A variety of minor bugs have been fixed (affecting all platforms).
Revised June 2, 1998
Comments and questions to: info@ncbi.nlm.nih.gov
(NCBI) or http://www.ebi.ac.uk/support/
(EBI)
|