Genome assembly database

The genome assembly database contains detailed information about genome assemblies for eukaryota, bacteria and archaea. The scope of the genome collections database does not extend to viruses, viroids and bacteriophage.

The following information is available on the genome assembly database:

Definition of terms

Term Description
Chromosome An assembled pseudomolecule that represents a biological chromosome. Most of the chromosome is expected to be represented by sequenced bases, although some gaps may still be present.
Placed sequence A sequence that has a known chromosomal location and orientation.
Unlocalized sequence A sequence that is associated with a specific chromosome without being ordered or oriented on that chromosome.
Unplaced sequence A sequence that is not associated with any specific chromosome.
Genomic region A named part of the primary assembly for which alternate loci or patches are available.
Alternate locus A sequence that provides an alternate representation of a locus. Alternate locus are collected into additional assembly units (i.e. not in the primary assembly).
Patch A sequence that provides a fix and/or novel sequence to the genome assembly.
Fix patch Sequence corrections or assembly gap reductions for the primary assembly introduced in a minor release. Fix patches are expected to be incorporated into the primary or alternate loci assembly units in the next major release.
Novel patch Novel sequences for the primary assembly introduced in a minor release. Novel patches are expected to be incorporated into the primary assembly unit in the next major release.
Assembly A set of chromosome assemblies, unlocalized and unplaced sequences, alternate loci and patches that represent a genome.
Assembly unit An assembly is organized into assembly units.
Primary assembly unit Assembly unit that contains the set of assembled chromosomes, unlocalized and unplaced sequences that represent a non-redundant genome. Alternative loci and patches are not included in the primary assembly unit.
Major release A release of a genome assembly that contains a primary assembly and alternate loci, e.g. GRCh37.
Minor release A release of a genome assembly that adds patches to the major release, e.g. GRCh37.p5.
Assembly chain The major and minor releases form an assembly chain. For example, the assembly accession for GRCh37 major release is GCA_000001405.1. The assembly accession consists of two parts: the assembly chain accession (GCA_000001405) and the assembly version (1). The assembly version is incremented for each minor release while the assembly chain accession remains unchanged.

Assembly levels

Contig

The highest level of the primary assembly unit consists of contigs.

The contigs are available from gc_unlocalised, gc_unplaced, gc_placed and gc_wgs_set tables. Please note that:

  • contigs in the gc_wgs_set table may also appear in the gc_unlocalised, gc_unplaced and gc_placed tables
  • contigs may only appear in the gc_wgs_set table
  • contigs may only appear in the gc_unlocalised, gc_unplaced or gc_placed tables
  • gc_wgs_set table only contains the wgs set prefix rather than all independent contigs

Scaffold

The highest level of primary assembly unit consists of gapped contigs (scaffolds).

The scaffolds are available from gc_replicon, gc_unplaced and gc_unlocalised tables. Please note that scaffolds in gc_replicon may or may not have sequence accession numbers associated with them. Only scaffolds with accession numbers in the gc_replicon table are available as sequence entries.

Chromosome

The highest level of primary assembly unit consists of a mixture of assembled chromosomes, unlocalised and unplaced scaffolds.

The chromosomes are available from the gc_replicon table. The unlocalised and unplaced scaffolds are available from the gc_unlocalised and gc_unplaced tables. Please note that because there can be unlocalised sequences on a chromosome, some chromosomes in gc_replicon might not have accession numbers.

Complete chromosome with gaps

The highest level of primary assembly unit consists of a assembled chromosomes without any unlocalised and unplaced scaffolds.

The chromosomes are available from the gc_replicon table. Please note that all chromosomes in gc_replicon are expected to have accession numbers.

Gapless chromosome

Same as above, but the chromosomes do not contain any gaps.

Relational database

The genome collections information is made available as a relational database to Wellcome Trust Genome Campus (Hinxton) users. Please contact datasubs@ebi.ac.uk for further information.

Genome assembly DB schema

List of tables

Table name Description
GC_ASSEMBLY_SET Genome assembly sets. Each set is identified by set accession + version stored in column SET_ACC (e.g. GCA_000013085.1). Each assembly set consists of one or more assembly units. RefSeq assembly sets have GCF_ prefix, GenBank/EMBL-Bank/DDBJ assembly sets have GCA_ prefix.
GC_ASSEMBLY_UNIT Genome assembly units. Each unit is identified by set accession + version stored in column UNIT_ACC (e.g. GCA_000010415.1). Each assembly unit is associated with one of more assembly sets. RefSeq assembly units have GCF_ prefix, GenBank/EMBL-Bank/DDBJ assembly units have GCA_ prefix.
GC_REPLICON Assembled chromosomes associated with an assembly unit.
GC_WGS_SET Whole genome shotgun (WGS) sets associated with an assembly unit.
GC_UNPLACED Unplaced sequences associated with an assembly unit. Nothing is known about the placement of these sequences.
GC_REGION Named chromosomal regions.
GC_PLACED Placed scaffolds or contigs on replicons.
GC_GAP Remaining gaps on replicions.
GC_UNLOCALISED Unlocalised sequences associated with a replicon. The exact placement of these sequences is not known.
GC_PROJECT Projects.
GC_WH_WGS_SET_STATS Basic nucleotide and protein sequence statistics associated with WGS sets.
GC_WH_SEQUENCE_STATS Basic nucleotide and protein sequence statistics associated with genome assembly sequences.
GC_ASSEMBLY_SET_XREF Associations between GenBank/INSDC and Refseq assembly sets.
GC_ASSEMBLY_UNIT_XREF Associations between GenBank/INSDC and Refseq assembly units.
GC_ASSEMBLY_SET_PROJECT Associations between assembly sets and projects.
GC_ASSEMBLY_UNIT_PROJECT Associations between assembly units and projects.
GC_ASSEMBLY_SET_STATS Assembly statistics for assembly sets.
GC_ASSEMBLY_UNIT_STATS Assembly statistics for assembly units.

Table columns

GC_ASSEMBLY_SET table

Column name Description
SET_ACC The assembly set accession + version (e.g. GCA_000013085.1).
SET_CHAIN The assembly set accession without version (e.g. GCA_000013085).
SET_VERSION The assembly set version.
NAME A short name for the assembly set.
LONG_NAME A long name for the assembly set.
IS_PATCH 'Y' if the assembly set is a patch assembly or 'N' of not.
IS_REFSEQ 'Y' if the assembly set is a RefSeq assembly or 'N' if not.
REFSEQ_SET_ACC Related RefSeq assembly (if any). Used only when IS_REFSEQ is 'N'.
TAX_ID The taxonomic identifier (e.g. 9606 for human) of the sequenced organism.
SCIENTIFIC_NAME The scientific name of the sequenced organism.
COMMON_NAME The common name of the sequenced organism.
STATUS_ID The status of the assembly set: public (4), suppressed (5), killed (6).
GENOME_REPRESENTATION Either full or partial.
ASSEMBLY_LEVEL The assembly level: contig, scaffold, chromosome, complete chromosome with gaps, gapless chromosome.
FIRST_CREATED The date when the assembly was created.
LAST_UPDATED The date when the assembly was last updated.
SAMPLE_ACC The sample used for the assembly.

GC_ASSEMBLY_UNIT table

Column name Description
SET_ACC The assembly set accession + version (e.g. GCA_000013085.1). RefSeq assembly sets have GCF_ prefix, GenBank/EMBL-Bank/DDBJ assembly sets have GCA_ prefix.
UNIT_ACC The assembly unit accession + version (e.g. GCA_000010415.1). RefSeq assembly units have GCF_ prefix, GenBank/EMBL-Bank/DDBJ assembly units have GCA_ prefix.
NAME A short name for the assembly unit.
LONG_NAME A long name for the assembly unit.
IS_PRIMARY 'Y' if the assembly unit is the primary assembly unit or 'N' if not.
IS_REFSEQ 'Y' if the assembly unit is a RefSeq unit or 'N' if not.
REFSEQ_UNIT_ACC Related RefSeq assembly unit (if any). Used only when IS_REFSEQ is 'N'.

GC_REPLICON table

Column name Description
REPLICON_ID A numeric unique identifier for the replicon. Used to link GC_REPLICON table to GC_PLACED, GC_GAP and GC_UNLOCALISED tables.
SET_ACC The assembly set accession + version (e.g. GCA_000013085.1). RefSeq assembly sets have GCF_ prefix, GenBank/EMBL-Bank/DDBJ assembly sets have GCA_ prefix.
UNIT_ACC The assembly unit accession + version (e.g. GCA_000010415.1). RefSeq assembly units have GCF_ prefix, GenBank/EMBL-Bank/DDBJ assembly units have GCA_ prefix.
REPLICON_ACC The sequence accession + version (e.g. FR796453.1).
REPLICON_TYPE The chromosome type: proviral, genomic, macronuclear, mitochondrion, chromosome, plasmid, nucleomorph, cyanelle, plastid, kinetoplast, extrachrom, chromatophore, chloroplast or apicoplast.
REPLICON_ORDER The order of the chromosome.
NAME The name of the chromosome.
REFSEQ_ACC RefSeq accession (if any).

GC_WGS_SET table

Column name Description
SET_ACC The assembly set accession + version (e.g. GCA_000013085.1).
UNIT_ACC The assembly unit accession + version (e.g. GCA_000010415.1).
PREFIX The WGS set prefix.
VERSION The WGS set version.

GC_UNPLACED table

Column name Description
SET_ACC The assembly set accession + version (e.g. GCA_000013085.1).
UNIT_ACC The assembly unit accession + version (e.g. GCA_000010415.1).
ACC The sequence accession + version (e.g. AAGJ03183083.1).
REFSEQ_ACC RefSeq accession (if any).

GC_REGION table

Column name Description
SET_ACC The assembly set accession + version (e.g. GCA_000013085.1).
UNIT_ACC The assembly unit accession + version containing the region. The column will be shortly renamed to UNIT_ACC.
REPLICION_ACC The chromosome sequence accession + version.
NAME The name of the region.
REGION_FROM This is the begin position where the region is placed on the chromosome.
REGION_TO This is the end position where the region is placed on the chromosome.
REPLICON_FROM This is the begin position where the alt-locus is placed on the chromosome.
REPLICON_TO This is the end position where the alt-locus is placed on the chromosome.
REGION_TYPE The region type: alt-loci or patch.
ACC The region sequence accession + version.
ACC_FROM This is the (sub)sequence begin position of the alt-locus sequence.
ACC_TO This is the (sub)sequence end position of the alt-locus sequence.
COMPLEMENT 'Y' is the region sequence is complemented or 'N' if not.

GC_PLACED table

Column name Description
SET_ACC The assembly set accession + version (e.g. GCA_000013085.1).
REPLICION_ID A numeric unique identifier for a replicon linking this table to GC_REPLICON table.
REPLICON_FROM This is the begin position where the sequence is placed on the chromosome.
REPLICON_TO This is the end position where the sequence is placed on the chromosome.
ACC The placed sequence accession + version.
ACC_FROM This is the (sub)sequence begin position of the placed sequence.
ACC_TO This is the (sub)sequence end position of the placed sequence.
COMPLEMENT 'Y' is the placed sequence is complemented or 'N' if not.
REFSEQ_ACC RefSeq accession (if any).

GC_GAP table

Column name Description
SET_ACC The assembly set accession + version (e.g. GCA_000013085.1).
REPLICION_ID A numeric unique identifier for a replicon linking this table to GC_REPLICON table.
REPLICON_FROM This is the begin position where the gap is on the chromosome.
REPLICON_TO This is the end position where the gap is on the chromosome.
LENGTH The gap length.
GAP_TYPE The gap type: telomere, heterochromatin,contig, clone, centromere or short-arm.

GC_UNLOCALISED table

Column name Description
SET_ACC The assembly set accession + version (e.g. GCA_000013085.1).
REPLICION_ID A numeric unique identifier for a replicon linking this table to GC_REPLICON table.
ACC The unlocalised sequence accession + version.
REFSEQ_ACC RefSeq accession (if any).

GC_WH_WGS_SET_STATS table

Column name Description
PREFIX The WGS set prefix.
VERSION The WGS set version.
SEQ_CNT The number of sequences in the WGS set.
BASE_CNT The number of base pairs in the WGS set.
CDS_CNT The number of protein coding (CDS) features in the WGS set.

GC_WH_SEQUENCE_STATS table

Column name Description
ACC The sequence accession + version (e.g. FR796452.1).
DATACLASS The dataclass of the sequence (CON, WGS, STD, HTG).
BASE_CNT The number of base pairs in the sequence.
CDS_CNT The number of protein coding (CDS) features in the sequence.
CONTIG_1_SEQ_CNT The number of scaffolds/contigs associated with the CON sequence.
CONTIG_1_CDS_CNT The number of protein coding (CDS) features in these scaffolds/contigs.
CONTIG_2_SEQ_CNT The number of contigs associated with scaffolds through an additional layer of CONs.
CONTIG_2_CDS_CNT The number of protein coding (CDS) features in these contigs.

GC_ASSEMBLY_SET_PROJECT table

Column name Description
SET_ACC The assembly set accession + version.
PROJECT_ACC The project accession.

GC_ASSEMBLY_UNIT_PROJECT table

Column name Description
SET_ACC The assembly set accession + version.
UNIT_ACC The assembly unit accession + version.
PROJECT_ACC The project accession.

GC_ASSEMBLY_SET_STATS table

Column name Description
SET_ACC The assembly set accession + version.
TYPE Statistics for unordered-scaf, all-scaf or aligned-scaf.
KEY The statistics name (e.g. n50).
VALUE The statistics value (e.g. 323391 for n50).

GC_ASSEMBLY_UNIT_STATS table

Column name Description
SET_ACC The assembly set accession + version.
UNIT_ACC The assembly unit accession + version.
TYPE Statistics for unordered-scaf, unplaced-scaf, unlocalized-scaf or all-scaf.
KEY The statistics name (e.g. n50).
VALUE The statistics value (e.g.323391 for n50).

GC_PROJECT table

Column name Description
PROJECT_ACC The project accession.
PROJECT_ID The NCBI project number.
PROJECT_TITLE A short descriptive title of the project.
PROJECT_NAME The project name.
PROJECT_DESCRIPTION A longer description of the project.
PROJECT_TYPE The type of the project: SUBMITTER, ORGANISM, UMBRELLA.
STRAIN The strain.
LOCUS_TAG The locus tag used to identify annotated features.
TAX_ID The taxonomic identifier (e.g. 9606 for human) of the sequenced organism.
SCIENTIFIC_NAME The scientific name of the sequenced organism.
IS_REFSEQ 'Y' if the project is a refseq project or 'N' if not.

BREED The sequenced breed (if applicable).
CULTIVAR The sequenced cultivar (if applicable).
ISOLATE The sequenced isolate (if applicable).

GC_ASSOCIATED_PROJECT table

Column name Description
PROJECT_ACC The project accession. In case of hierarchial link a child project accession.
ASSOCIATED_PROJECT_ACC Associated project accession. In case of hierarchial link a parent project accession.
ASSOCIATION_TYPE The type of the association: HIERARCHIAL, PEER.

Example queries

Select all Embl-Bank/GenBank assemblies for human

select set_acc as "Assembly Accession",
    set_chain as "Assembly Chain",
    set_version as "Assembly Version",
    name as "Assembly Name",           
    decode(is_patch,'N','Major release', 'Minor release') as "Release Type"
from gc_assembly_set where tax_id = 9606 and is_refseq = 'N' and status_id = 4
order by set_acc; 

Rows returned 20 Sep 2011:

Assembly Accession Assembly Chain Assembly Version Assembly Name Release Type
GCA_000001405.1 GCA_000001405 1 GRCh37 Major release
GCA_000001405.2 GCA_000001405 2 GRCh37.p1 Minor release
GCA_000001405.3 GCA_000001405 3 GRCh37.p2 Minor release
GCA_000001405.4 GCA_000001405 4 GRCh37.p3 Minor release
GCA_000001405.5 GCA_000001405 5 GRCh37.p4 Minor release
GCA_000001405.6 GCA_000001405 6 GRCh37.p5 Minor release
GCA_000001405.7 GCA_000001405 7 GRCh37.p6 Minor release
GCA_000002115.1 GCA_000002115 1 Hs_Celera_WGSA Major release
GCA_000002125.1 GCA_000002125 1 Homo sapiens Major release
GCA_000002135.1 GCA_000002135 1 CRA_TCAGchr7v1 Major release
GCA_000002135.2 GCA_000002135 2 pre-CRA_TCAGchr7v2 Major release
GCA_000002135.3 GCA_000002135 3 CRA_TCAGchr7v2 Major release
GCA_000004845.1 GCA_000004845 1 YH1 Major release
GCA_000005465.1 GCA_000005465 1 BGIAF Major release
GCA_000181135.1 GCA_000181135 1 Watson-partial Major release
GCA_000185165.1 GCA_000185165 1 HsapALLPATHS1 Major release
GCA_000212995.1 GCA_000212995 1 Homo sapiens Major release

Select all Embl-Bank/GenBank assemblies for GRCh37 assembly chain

select 
    set_acc as "Assembly Accession",
    set_chain as "Assembly Chain",
    set_version as "Assembly Version",
    gc_assembly_set.name as "Assembly Name",
    decode(is_patch,'N','Major release', 'Minor release') as "Release Type"
from gc_assembly_set
where set_chain = (
    select set_chain
    from gc_assembly_set
    where name = 'GRCh37' and is_refseq = 'N' and status_id = 4)
order by set_acc;

Rows returned 20 Sep 2011:

Assembly Accession Assembly Chain Assembly Version Assembly Name Release Type
GCA_000001405.1 GCA_000001405 1 GRCh37 Major release
GCA_000001405.2 GCA_000001405 2 GRCh37.p1 Minor release
GCA_000001405.3 GCA_000001405 3 GRCh37.p2 Minor release
GCA_000001405.4 GCA_000001405 4 GRCh37.p3 Minor release
GCA_000001405.5 GCA_000001405 5 GRCh37.p4 Minor release
GCA_000001405.6 GCA_000001405 6 GRCh37.p5 Minor release
GCA_000001405.7 GCA_000001405 7 GRCh37.p6 Minor release

Select all regions for GRCh37.p5 assembly

select 
    name as "Region Name",
    (select name from gc_replicon where gc_replicon.replicon_acc = gc_region.replicon_acc and rownum < 2) as "Chromosome",
    region_from as "Region Start Position",
    region_to as "Region End Position",
    replicon_from as "Alt-Locus Start Position",
    replicon_to as "Alt-Locus End Position",
    decode(region_type, 'alt-loci', 'Alternate Locus', 'Patch') as "Alt-Locus Type",
    acc as "Alt-Locus Accession"
from gc_region
where set_acc = 'GCA_000001405.6'
order by acc;

Rows returned 20 Sep 2011:

Region Name Chromosome Region Start Position Region End Position Alt-Locus Start Position Alt-Locus End Position> Alt-Locus Type Alt-Locus Accession
MHC 6 28477797 33448354 28696604 33335493 Alternate Locus GL000250.1
MHC 6 28477797 33448354 28477797 33351542 Alternate Locus GL000251.1
MHC 6 28477797 33448354 28696604 33329076 Alternate Locus GL000252.1
MHC 6 28477797 33448354 28696604 33225977 Alternate Locus GL000253.1
MHC 6 28477797 33448354 28696604 33359642 Alternate Locus GL000254.1
MHC 6 28477797 33448354 28696604 33379750 Alternate Locus GL000255.1
MHC 6 28477797 33448354 28659143 33448354 Alternate Locus GL000256.1
UGT2B17 4 69170077 69878175 69170077 69878175 Alternate Locus GL000257.1
MAPT 17 43384864 44913631 43384864 44913631 Alternate Locus GL000258.1
SMA 5 68512646 70910270 68512646 70910270 Patch GL339449.2
ABO 9 136049442 136369192 136049442 136369192 Patch GL339450.1
REGION1 1 248865779 249098883 248865779 248908210 Patch GL383516.1
REGION1 1 248865779 249098883 249058211 249098883 Patch GL383517.1
REGION2 1 153673007 153838214 153673007 153838214 Patch GL383518.1
MTX1 1 155180173 155275036 155180173 155275036 Patch GL383519.1
REGION3 1 198339213 198694304 198339213 198694304 Patch GL383520.1
REGION4 2 36453102 36590458 36453102 36590458 Patch GL383521.1
SPC25 2 169686873 169793704 169686873 169793704 Patch GL383522.1
VPRBP 3 51416109 51584055 51416109 51584055 Patch GL383523.1
DNAH12 3 57369478 57399969 57369478 57399969 Patch GL383524.1
SLC25A26 3 66270271 66308065 66270271 66308065 Patch GL383525.1
REGION5 3 151307154 151477286 151307154 151477286 Patch GL383526.1
REGION7 4 156756530 156908416 156756530 156908416 Patch GL383527.1
REGION6 4 34519577 34885480 34519577 34885480 Patch GL383528.1
LPHN3 4 62777687 62877254 62777687 62877254 Patch GL383529.1
MCTP1 5 94505561 94590195 94505561 94590195 Patch GL383530.1
REGION8 5 161800673 161968654 161800673 161968654 Patch GL383531.1
REGION9 5 12681538 12744122 12681538 12744122 Patch GL383532.1
REGION10 6 80059725 80156628 80059725 80156628 Patch GL383533.1
REGION11 7 141333209 141446583 141333209 141446583 Patch GL383534.1
EPPK1_SPATC1 8 144743526 145146062 144743526 145146062 Patch GL383535.1
SCXB 8 145285645 145659901 145285645 145659901 Patch GL383536.1
REGION12 9 139136890 139252828 139136890 139166997 Patch GL383537.1
REGION12 9 139136890 139252828 139216998 139252828 Patch GL383538.1
REGION14 9 7428994 7577169 7428994 7577169 Patch GL383539.1
REGION13 9 72028659 72092013 72028659 72092013 Patch GL383540.1
MAMDC2 9 72639029 72804234 72639029 72804234 Patch GL383541.1
REGION15 9 90793962 90842895 90793962 90842895 Patch GL383542.1
FAM23A_MRC1 10 17613209 18252930 17613209 18252930 Patch GL383543.1
REGION17 10 133258319 133381404 133258319 133381404 Patch GL383544.1
REGION16 10 27574584 27706537 27574584 27706537 Patch GL383545.1
REGION18 10 45670681 45964419 45670681 45964419 Patch GL383546.1
REGION19 11 25191953 25340626 25191953 25340626 Patch GL383547.1
GALNT9 12 132806993 132967794 132806993 132967794 Patch GL383548.1
REGION21 12 28148967 28263711 28148967 28263711 Patch GL383549.1
REGION22 12 58326520 58486538 58326520 58486538 Patch GL383550.1
REGION20 12 126711744 126890020 126711744 126890020 Patch GL383551.1
REGION23 12 59323046 59454651 59323046 59454651 Patch GL383552.1
REGION24 12 101503370 101652073 101503370 101652073 Patch GL383553.1
REGION25 15 28557187 28842093 28557187 28842093 Patch GL383554.1
MEGF11 15 66200521 66577156 66200521 66577156 Patch GL383555.1
REGION26 16 55822434 56002460 55822434 56002460 Patch GL383556.1
SNTB2 16 69174054 69258593 69174054 69258593 Patch GL383557.1
PECAM1 17 62273514 62649312 62273514 62649312 Patch GL383558.1
SOCS7 17 36372617 36711255 36372617 36711255 Patch GL383559.1
MYO19 17 34442621 35005379 34442621 35005379 Patch GL383560.1
REGION27 17 21250948 21566608 21250948 21566608 Patch GL383561.1
FAM101B 17 252429 296626 252429 296626 Patch GL383562.1
RPH3AL 17 1 252428 1 252428 Patch GL383563.1
REGION29 17 39463362 39589187 39463362 39589187 Patch GL383564.1
REGION28 17 68302419 68520360 68302419 68520360 Patch GL383565.1
REGION30 17 75216812 75295408 75216812 75295408 Patch GL383566.1
REGION31 18 47818564 48101162 47818564 48101162 Patch GL383567.1
REGION34 18 70600357 70692016 70600357 70692016 Patch GL383568.1
REGION35 18 76253467 76412030 76253467 76412030 Patch GL383569.1
REGION32 18 49189306 49348012 49189306 49348012 Patch GL383570.1
REGION33 18 65090960 65219788 65090960 65219788 Patch GL383571.1
REGION36 18 76694886 76848997 76694886 76848997 Patch GL383572.1
ZNF66 19 20845947 21225187 20845947 21225187 Patch GL383573.1
REGION38 19 34643165 34791855 34643165 34791855 Patch GL383574.1
REGION37 19 21831092 21991838 21831092 21991838 Patch GL383575.1
REGION39 19 22303652 22483468 22303652 22483468 Patch GL383576.1
REGION40 20 17751747 17874115 17751747 17874115 Patch GL383577.1
REGION41 21 15796462 15847792 15796462 15847792 Patch GL383578.1
REGION42 21 23474793 23669288 23474793 23669288 Patch GL383579.1
REGION43 21 34141390 34210247 34141390 34210247 Patch GL383580.1
TMEM50B 21 34777735 34884866 34777735 34884866 Patch GL383581.1
CYP2D6 22 42477964 42648568 42477964 42648568 Patch GL383582.1
APOBEC 22 39280299 39407165 39280299 39407165 Patch GL383583.1
REGION44 2 149790583 149880633 149790583 149880633 Patch GL582966.1
REGION45 4 75382210 75689879 75382210 75689879 Patch GL582967.1
SH2B2 7 101718951 102072447 101718951 102072447 Patch GL582968.1
REGION46 7 57342227 57586048 57342227 57586048 Patch GL582969.1
REGION48 7 56835596 57190579 56835596 57190579 Patch GL582970.1
TRB 7 141557850 142778624 141557850 142778624 Patch GL582971.1
REGION47 7 98260131 98556215 98260131 98556215 Patch GL582972.1
REGION49 11 49862648 50121284 49862648 50121284 Patch GL582973.1
REGION50 12 60001 282464 60001 282464 Patch GL582974.1
REGION51 13 115085142 115109878 115085142 115109878 Patch GL582975.1
TTC25 17 39869611 40277911 39869611 40277911 Patch GL582976.1
REGION52 19 20193557 20845946 20193557 20845946 Patch GL582977.1
REGION53 20 61031539 61267733 61263370 61267733 Patch GL582978.1
REGION53 20 61031539 61267733 61031539 61213369 Patch GL582979.1
REGION54 2 243059660 243189373 243059660 243189373 Patch GL877870.2
REGION55 2 95326172 95618108 95326172 95618108 Patch GL877871.1
REGION56 4 190828226 191044276 190828226 191044276 Patch GL877872.1
REGION57 10 60001 224405 60001 224405 Patch GL877873.1
OLFACTORY_REGION_1 11 55956728 56641043 55956728 56641043 Patch GL877874.1
REGION50 12 60001 282464 60001 282464 Patch GL877875.1
REGION58 12 10953894 11330216 10953894 11330216 Patch GL877876.1
REGION59 X 803878 1227822 803878 1227822 Patch GL877877.1
REGION60 1 31872759 32017063 31872759 32017063 Patch GL949741.1
REGION61 5 29036605 29254576 29036605 29254576 Patch GL949742.1
REGION62 8 1965390 2498608 1965390 2498608 Patch GL949743.1
REGION63 11 69614786 69922571 69614786 69922571 Patch GL949744.1
REGION58 12 10953894 11330216 10953894 11330216 Patch GL949745.1
LRC 19 54528888 55595686 54528888 55595686 Patch GL949746.1
LRC 19 54528888 55595686 54528888 55595686 Patch GL949747.1
LRC 19 54528888 55595686 54528888 55595686 Patch GL949748.1
LRC 19 54528888 55595686 54528888 55595686 Patch GL949749.1
LRC 19 54528888 55595686 54528888 55595686 Patch GL949750.1
LRC 19 54528888 55595686 54528888 55595686 Patch GL949751.1
LRC 19 54528888 55595686 54528888 55595686 Patch GL949752.1
LRC 19 54528888 55595686 54528888 55595686 Patch GL949753.1
PAR#1 X 60001 2699520     Patch  
PAR#2 X 154931044 155260560     Patch  
PAR#1 Y 10001 2649520     Patch  
PAR#2 Y 59034050 59363566     Patch  

Select all chromosomes for GRCh37.p5 assembly

select 
    name as "Chromosome",
    replicon_type as "Chromosome Type",
    replicon_acc as "Sequence Accession"
from gc_replicon
where set_acc = 'GCA_000001405.6';

Rows returned 20 Sep 2011:

Chromosome Chromosome Type Sequence Accession
1 chromosome CM000663.1
2 chromosome CM000664.1
3 chromosome CM000665.1
4 chromosome CM000666.1
5 chromosome CM000667.1
6 chromosome CM000668.1
7 chromosome CM000669.1
8 chromosome CM000670.1
9 chromosome CM000671.1
10 chromosome CM000672.1
11 chromosome CM000673.1
12 chromosome CM000674.1
13 chromosome CM000675.1
14 chromosome CM000676.1
15 chromosome CM000677.1
16 chromosome CM000678.1
17 chromosome CM000679.1
18 chromosome CM000680.1
19 chromosome CM000681.1
20 chromosome CM000682.1
21 chromosome CM000683.1
22 chromosome CM000684.1
X chromosome CM000685.1
Y chromosome CM000686.1
MT mitochondrion J01415.2

Latest ENA News

20 Aug 2014: Read data through Globus GridFTP
Read data can now be downloaded using Globus GridFTP through ebi#ena Globus Online public endpoint.

18 Aug 2014: Changes to SRA XML 1.5
Small changes to Experiment XML, Analysis XML, EGA Dataset XML, EGA DAC XMLs were deployed on 11th of August 2014.

1 Jul 2014: ENA release 120
Release 120 of ENA's assembled/annotated seqences now available

23 May 2014: Change to date format for advanced search
From 16th June 2014, the date format used in the advanced search will be changed to ISO format (YYYY-MM-DD).

20 May 2014: Update to the ENA SAMPLE checklist
From 10th of June 2014 the ENA SAMPLE checklist XML will be updated and the older version will be deprecated.