Assembly description files

The goal of this document is to give sufficient information for submitters to prepare genome assembly description files for submission.

Chromosome list file

The chromosome list file is a full list of the chromosomes defined in the assembly. Chromosomes also include organelle (e.g. mitochondrion and chloroplast) and plasmid sequences. This tab separated text file (USASCII7 character set) is mandatory when the genome assembly contains assembled chromosomes.

Required fields are:

Column Description
OBJECT_NAME (first column) The object_name must match the "entry name" given in the flat file (AC * line), the fasta file (sequence name) or in the chromosome AGP file (first column). For 'Entry name' see here.
CHROMOSOME_NAME (second column)

The name or number of the sequenced replicon. Note that the value of this column will apear as the value for /chromosome, /plasmid or /segment qualifiers on ENA sequence records.

CHROMOSOME_TYPE (third column) The type of the chromosome:
  • Chromosome
  • Plasmid
  • monopartite
  • segmented
  • multipartite
CHROMOSOME_LOCATION (Optional; fourth column) By default eukaryotic chromosomes will be assumed to reside in the nucleus and procaryotic chromosomes and plasmids in the cytoplasm. A more specific cellular location can be described by selecting a value from the following list:
  • Macronuclear
  • Nucleomorph
  • Mitochondrion
  • Kinetoplast
  • Chloroplast
  • Chromoplast
  • Plastid
  • Virion
  • Phage
  • Proviral
  • Prophage
  • Viroid
  • Cyanelle
  • Apicoplast
  • Leucoplast
  • Proplastid
  • Hydrogenosome
  • Chromatophore

For an example, please see: chromosome.txt

Unlocalised list file

The unlocalised list file is a tab separated text file (USASCII7 character set) that can be submitted when the assembly submission includes unlocalised sequences that are associated with a specific chromosome but their order and orientation is unknown. Only if the assembly has a chromosome defined, it is possible to submit an unlocalised list file but this is not mandatory.

Required fields are:

Column Description
OBJECT_NAME The name of the unlocalised sequence in an AGP file.
CHROMOSOME_NAME The name of the chromosome associated with this sequence.

For an example, please see: unlocalised.txt

More information

Read about contig, scaffold, chromosome files or go back to the genome assembly submissions main page.