Submitting synthetic constructs

The European Nucleotide Archive (ENA) offers submission, archiving and presentation services for synthetic molecules and constructs. Here we describe submission policies, content and presentation services that apply to sequences from synthetic molecules and constructs.

Content

ENA currently holds sequence records corresponding to synthetic molecules and constructs in the following categories: primer sequences, cloning vectors, expression vectors, shuttle vectors, artificial genes, artificial sequences, artificial oligonucleotide, synthetic DNA, synthetic RNA.

On 01 April 2014, ENA held 5,142,920 sequences from molecules labelled as synthetic, of which 5,039,792 were records related to patent processes and 103,128 were non-patent-related records.

Data requirements for synthetic molecules and constructs

ENA is a repository for nucleotide sequence data and sequences submitted must have been resolved by a sequencing method. Currently, ENA does not accept sequences that have not been resolved by sequencing methods and is unable to take sequence that solely constitutes the input design for a synthetic molecule.

ENA draws a distinction between primary data and data provided by third parties. Primary data are generated and submitted by a scientist, group or consortium; these people are the data owners and retain the right to edit and update interpretations (such as assemblies and functional annotation) of the data. Third parties are defined as those submitters who retrieve existing data under other scientists' ownership and provide alternative interpretations, such as assemblies and functional annotation. Further details on Third Party Data are available.

For synthetic molecules that have been generated de novo, the above requirements typically lead to a requirement for acceptance into ENA that the sequence of the molecule must have been resolved by a sequencing method.

For regions of constructs that have been cloned from existing sources, the above requirements often lead to a need to refer in a submission to existing ENA data under Third Party Data arrangements.

For constructs that include both de novo synthetic molecules and regions cloned from existing sources, regions that have been sequenced must first be submitted to ENA and then Third Party Data are prepared that describe the construct and the source sequences from which its overall sequence has been derived.

Annotation notes

The synthetic source of a sequenced molecule (or region within a sequenced molecule) is indicated using the organism field - the field in which taxonomic annotation is normally provided. The annotation "synthetic construct" in this field is used in cases where the construct has not been given a name. Where the constructed has been named, such as in the case of a cloning vector, the name is provided in this field and ENA will arrange for the taxonomic classification of this name within an appropriate "synthetic construct" 'lineage'.

In entries that comprise multiple sources (such as Third Party Data records in which a mixture of cloned and synthesised molecules are described), coordinated ranges within a submitted sequence must be indicated. The /focus qualifier is a required annotation that is used to indicate which source (which may be "synthetic construct" or a more conventional taxonomic annotation such as "Mus musculus") should be considered the principal taxonomic classification for the record (, which may be "synthetic construct"). This is important for the overall taxonomic classification of a construct - determining how a user would find the record - and for definition of an appropriate default translation table to be applied to coding regions

For general annotation guidelines please visit here.

For some examples see here.

Submission instructions

For submissions to ENA you need to have a submission account. Please register and log in to Webin.

For submitting synthetic constructs you will need to create ENA supported flat files which you can create by using either specific third party software that can save and export your annotation in our flat file format or by creating your own flat files. Please refer to section Webin Entry Upload submissions here and also go to this page for a synthetic construct specific Entry Upload Template. Once you have your flat files ready you can upload them using Webin:

Go to the 'New submission' link on the left hand menu and select 'Entry upload'.

Search and retrieval

The ENA Browser provide web and REST programmatic access. All ENA data are searchable and available for download from the ENA Browser. In order to resolve all sequences from synthetic molecules and constructs using ENA's Advanced Search, for example, please select domain 'sequence' and taxonomic division 'SYN' then click on 'Search'*. To filter further and, for example, exclude patent-related records, specify in addition and exclusion of the data class "PAT"**.

For detailed documentation on retrieval methods and search options please visit here.

*The results of this search can be reached directly here.

**The results of this search can be reached directly here.

Latest ENA news

11 Oct 2017: Read data download issues resolved

Read data download issues previously affecting ftp.sra.ebi.ac.uk and fasp.sra.ebi.ac.uk services now resolved.

06 Oct 2017: ENA read data download issues

Issues with read data download from ftp.sra.ebi.ac.uk and fasp.sra.ebi.ac.uk

04 Oct 2017: ENA Release 133

Release 133 of ENA's assembled/annotated sequences now available