Genome assembly database

The genome assembly database contains detailed information about genome assemblies for eukaryota, bacteria and archaea. The scope of the genome collections database does not extend to viruses, viroids and bacteriophage.

The following information is available on the genome assembly database:

Definition of terms

Term Description
Chromosome An assembled pseudomolecule that represents a biological chromosome. Most of the chromosome is expected to be represented by sequenced bases, although some gaps may still be present.
Placed sequence A sequence that has a known chromosomal location and orientation.
Unlocalized sequence A sequence that is associated with a specific chromosome without being ordered or oriented on that chromosome.
Unplaced sequence A sequence that is not associated with any specific chromosome.
Genomic region A named part of the primary assembly for which alternate loci or patches are available.
Alternate locus A sequence that provides an alternate representation of a locus. Alternate locus are collected into additional assembly units (i.e. not in the primary assembly).
Patch A sequence that provides a fix and/or novel sequence to the genome assembly.
Fix patch Sequence corrections or assembly gap reductions for the primary assembly introduced in a minor release. Fix patches are expected to be incorporated into the primary or alternate loci assembly units in the next major release.
Novel patch Novel sequences for the primary assembly introduced in a minor release. Novel patches are expected to be incorporated into the primary assembly unit in the next major release.
Assembly A set of chromosome assemblies, unlocalized and unplaced sequences, alternate loci and patches that represent a genome.
Assembly unit An assembly is organized into assembly units.
Primary assembly unit Assembly unit that contains the set of assembled chromosomes, unlocalized and unplaced sequences that represent a non-redundant genome. Alternative loci and patches are not included in the primary assembly unit.
Major release A release of a genome assembly that contains a primary assembly and alternate loci, e.g. GRCh37.
Minor release A release of a genome assembly that adds patches to the major release, e.g. GRCh37.p5.
Assembly chain The major and minor releases form an assembly chain. For example, the assembly accession for GRCh37 major release is GCA_000001405.1. The assembly accession consists of two parts: the assembly chain accession (GCA_000001405) and the assembly version (1). The assembly version is incremented for each minor release while the assembly chain accession remains unchanged.

Assembly levels

Contig

The highest level of the primary assembly unit consists of contigs.

The contigs are available from gc_unlocalised, gc_unplaced, gc_placed and gc_wgs_set tables. Please note that:

  • contigs in the gc_wgs_set table may also appear in the gc_unlocalised, gc_unplaced and gc_placed tables
  • contigs may only appear in the gc_wgs_set table
  • contigs may only appear in the gc_unlocalised, gc_unplaced or gc_placed tables
  • gc_wgs_set table only contains the wgs set prefix rather than all independent contigs

Scaffold

The highest level of primary assembly unit consists of gapped contigs (scaffolds).

The scaffolds are available from gc_replicon, gc_unplaced and gc_unlocalised tables. Please note that scaffolds in gc_replicon may or may not have sequence accession numbers associated with them. Only scaffolds with accession numbers in the gc_replicon table are available as sequence entries.

Chromosome

The highest level of primary assembly unit contains assembled chromosomes.  There may be unlocalised and unplaced scaffolds and the chromosomes may contain gaps.

The chromosomes are available from the gc_replicon table. The unlocalised and unplaced scaffolds are available from the gc_unlocalised and gc_unplaced tables. Please note that because there can be unlocalised sequences on a chromosome, some chromosomes in gc_replicon might not have accession numbers.

Formerly, there were three different levels containing chromosomes: chromosome, complete chromosome with gaps, and gapless chromosome.  These have now been merged into a single assembly level.

Complete genome

Every chromosome in the assembly must be gapless, there are no unlocalised or unplaced sequences and the genome representation is "full".

The exception is for plasmid sequences, these can have gaps and unlocalised sequences.

Relational database

The genome collections information is made available as a relational database to Wellcome Trust Genome Campus (Hinxton) users. Please contact datasubs@ebi.ac.uk for further information.

Genome assembly DB schema

List of tables

Table name Description
GC_ASSEMBLY_SET Genome assembly sets. Each set is identified by set accession + version stored in column SET_ACC (e.g. GCA_000013085.1). Each assembly set consists of one or more assembly units. RefSeq assembly sets have GCF_ prefix, GenBank/EMBL-Bank/DDBJ assembly sets have GCA_ prefix.
GC_ASSEMBLY_UNIT Genome assembly units. Each unit is identified by set accession + version stored in column UNIT_ACC (e.g. GCA_000010415.1). Each assembly unit is associated with one of more assembly sets. RefSeq assembly units have GCF_ prefix, GenBank/EMBL-Bank/DDBJ assembly units have GCA_ prefix.
GC_REPLICON Assembled chromosomes associated with an assembly unit.
GC_WGS_SET Whole genome shotgun (WGS) sets associated with an assembly unit.
GC_UNPLACED Unplaced sequences associated with an assembly unit. Nothing is known about the placement of these sequences.
GC_REGION Named chromosomal regions.
GC_ALIGNED Sequences that have been aligned on named chromosomal regions.
GC_PLACED Placed scaffolds or contigs on replicons.
GC_GAP Remaining gaps on replicions.
GC_UNLOCALISED Unlocalised sequences associated with a replicon. The exact placement of these sequences is not known.
GC_PROJECT Projects.
GC_ASSOCIATED_PROJECT Project associations (parent/child or peer associations).
GC_PROJECT_LOCUS_TAG Project locus tags.
GC_WH_WGS_SET_STATS Basic nucleotide and protein sequence statistics associated with WGS sets.  (No longer refreshed in 2016, and to soon be removed)
GC_WH_SEQUENCE_STATS Basic nucleotide and protein sequence statistics associated with genome assembly sequences.  (No longer refreshed in 2016, and to soon be removed)
GC_WH_CDS_STATS Basic protein sequence statistics associated with genome assembly sets.
GC_ASSEMBLY_SET_PROJECT Associations between assembly sets and projects.
GC_ASSEMBLY_UNIT_PROJECT Associations between assembly units and projects.
GC_ASSEMBLY_SET_STATS Assembly statistics for assembly sets.
GC_PROKARYOTE Summary table including some sequence and feature statistics for a subset of the prokaryote assembly sets.
GC_EUKARYOTE Summary table including some sequence and feature statistics for a subset of the eukaryote assembly sets.
GC_VIRUS Summary table including some sequence and feature statistics for some viral genomes.

Table columns

GC_ASSEMBLY_SET table

Column name Description
SET_ACC The assembly set accession + version (e.g. GCA_000013085.1).
SET_CHAIN The assembly set accession without version (e.g. GCA_000013085).
SET_VERSION The assembly set version.
NAME A short name for the assembly set.
LONG_NAME A long name for the assembly set.
IS_PATCH 'Y' if the assembly set is a patch assembly or 'N' of not.
TAX_ID The taxonomic identifier (e.g. 9606 for human) of the sequenced organism.
SCIENTIFIC_NAME The scientific name of the sequenced organism.
COMMON_NAME The common name of the sequenced organism.
STRAIN The strain of the sequenced organism
STATUS_ID The status of the assembly set: public (4), suppressed (5), killed (6).
GENOME_REPRESENTATION Either full or partial.
ASSEMBLY_LEVEL The assembly level: contig, scaffold, chromosome, complete chromosome with gaps, gapless chromosome.
FIRST_CREATED The date when the assembly was created.
LAST_UPDATED The date when the assembly was last updated.
SAMPLE_ACC The sample used for the assembly.
CENTER_NAME The name of the submitting centre.
SUBMITTED_DATE The date the assembly was submitted.

GC_ASSEMBLY_UNIT table

Column name Description
SET_ACC The assembly set accession + version (e.g. GCA_000013085.1). RefSeq assembly sets have GCF_ prefix, GenBank/EMBL-Bank/DDBJ assembly sets have GCA_ prefix.
UNIT_ACC The assembly unit accession + version (e.g. GCA_000010415.1). RefSeq assembly units have GCF_ prefix, GenBank/EMBL-Bank/DDBJ assembly units have GCA_ prefix.
NAME A short name for the assembly unit.
LONG_NAME A long name for the assembly unit.
IS_PRIMARY 'Y' if the assembly unit is the primary assembly unit or 'N' if not.

GC_REPLICON table

Column name Description
REPLICON_ID A numeric unique identifier for the replicon. Used to link GC_REPLICON table to GC_PLACED, GC_GAP and GC_UNLOCALISED tables.
SET_ACC The assembly set accession + version (e.g. GCA_000013085.1). RefSeq assembly sets have GCF_ prefix, GenBank/EMBL-Bank/DDBJ assembly sets have GCA_ prefix.
UNIT_ACC The assembly unit accession + version (e.g. GCA_000010415.1). RefSeq assembly units have GCF_ prefix, GenBank/EMBL-Bank/DDBJ assembly units have GCA_ prefix.
REPLICON_ACC The sequence accession + version (e.g. FR796453.1).
REPLICON_TYPE The chromosome type: proviral, genomic, macronuclear, mitochondrion, chromosome, plasmid, nucleomorph, cyanelle, plastid, kinetoplast, extrachrom, chromatophore, chloroplast or apicoplast.
REPLICON_ORDER The order of the chromosome.
NAME The name of the chromosome.
LENGTH The length of the chromosome.

GC_WGS_SET table

Column name Description
SET_ACC The assembly set accession + version (e.g. GCA_000013085.1).
UNIT_ACC The assembly unit accession + version (e.g. GCA_000010415.1).
PREFIX The WGS set prefix.
VERSION The WGS set version.

GC_UNPLACED table

Column name Description
SET_ACC The assembly set accession + version (e.g. GCA_000013085.1).
UNIT_ACC The assembly unit accession + version (e.g. GCA_000010415.1).
ACC The sequence accession + version (e.g. AAGJ03183083.1).
NAME The name of the sequence.
LENGTH The length of the sequence

GC_REGION table

Column name Description
SET_ACC The assembly set accession + version (e.g. GCA_000013085.1).
UNIT_ACC The assembly unit accession + version (e.g. GCA_000010415.1).
REPLICON_ID The chromosome sequence ID.
NAME The name of the region.
REGION_TYPE The region type: Genomic region, Heterochromatin region or Centromere region
REPLICON_FROM This is the begin position where the region is placed on the chromosome.
REPLICON_TO This is the end position where the region is placed on the chromosome.

GC_ALIGNED table

Column name Description
SET_ACC The assembly set accession + version (e.g. GCA_000013085.1).
UNIT_ACC The assembly unit accession + version containing the region. The column will be shortly renamed to UNIT_ACC.
REPLICON_ID The chromosome sequence ID.
ACC The sequence accession + version.
NAME The name of the sequence.
LENGTH The length of the sequence.
PATCH_TYPE The sequence patch type: novel or fix.
ACC_FROM This is the (sub)sequence begin position of the alt-locus sequence.
ACC_TO This is the (sub)sequence end position of the alt-locus sequence.
COMPLEMENT 'Y' is the sequence is complemented or 'N' if not.
REPLICON_FROM This is the begin position where the alt-locus is placed on the chromosome.
REPLICON_TO his is the end position where the alt-locus is placed on the chromosome.

GC_PLACED table

Column name Description
SET_ACC The assembly set accession + version (e.g. GCA_000013085.1).
UNIT_ACC The assembly unit accession + version (e.g. GCA_000010415.1).
REPLICION_ID A numeric unique identifier for a replicon linking this table to GC_REPLICON table.
REPLICON_FROM This is the begin position where the sequence is placed on the chromosome.
REPLICON_TO This is the end position where the sequence is placed on the chromosome.
ACC The placed sequence accession + version.
ACC_FROM This is the (sub)sequence begin position of the placed sequence.
ACC_TO This is the (sub)sequence end position of the placed sequence.
COMPLEMENT 'Y' is the placed sequence is complemented or 'N' if not.

GC_GAP table

Column name Description
SET_ACC The assembly set accession + version (e.g. GCA_000013085.1).
UNIT_ACC The assembly unit accession + version (e.g. GCA_000010415.1).
REPLICION_ID A numeric unique identifier for a replicon linking this table to GC_REPLICON table.
REPLICON_FROM This is the begin position where the gap is on the chromosome.
REPLICON_TO This is the end position where the gap is on the chromosome.
LENGTH The gap length.
GAP_TYPE The gap type: telomere, heterochromatin,contig, clone, centromere or short-arm.

GC_UNLOCALISED table

Column name Description
SET_ACC The assembly set accession + version (e.g. GCA_000013085.1).
UNIT_ACC The assembly unit accession + version (e.g. GCA_000010415.1).
REPLICION_ID A numeric unique identifier for a replicon linking this table to GC_REPLICON table.
ACC The unlocalised sequence accession + version.
NAME The name of the sequence.
LENGTH The length of the sequence.

GC_WH_WGS_SET_STATS table

Column name Description
PREFIX The WGS set prefix.
VERSION The WGS set version.
SEQ_CNT The number of sequences in the WGS set.
BASE_CNT The number of base pairs in the WGS set.
CDS_CNT The number of protein coding (CDS) features in the WGS set.

GC_WH_SEQUENCE_STATS table

Column name Description
ACC The sequence accession + version (e.g. FR796452.1).
DATACLASS The dataclass of the sequence (CON, WGS, STD, HTG).
BASE_CNT The number of base pairs in the sequence.
CDS_CNT The number of protein coding (CDS) features in the sequence.
CONTIG_1_SEQ_CNT The number of scaffolds/contigs associated with the CON sequence.
CONTIG_1_CDS_CNT The number of protein coding (CDS) features in these scaffolds/contigs.
CONTIG_2_SEQ_CNT The number of contigs associated with scaffolds through an additional layer of CONs.
CONTIG_2_CDS_CNT The number of protein coding (CDS) features in these contigs.

GC_WH_CDS_STATS table

Column name Description
SET_ACC The assembly set accession + version (e.g. GCA_000013085.1).
CDS_CNT The number of protein coding (CDS) features in the assembly set.
WGS_CDS The number of protein coding (CDS) features in the assembly set's WGS sequences.
NONWGS_CDS The number of protein coding (CDS) features in the assembly set's non-WGS sequences.

GC_ASSEMBLY_SET_PROJECT table

Column name Description
SET_ACC The assembly set accession + version.
PROJECT_ACC The project accession.

GC_ASSEMBLY_UNIT_PROJECT table

Column name Description
SET_ACC The assembly set accession + version.
UNIT_ACC The assembly unit accession + version.
PROJECT_ACC The project accession.

GC_ASSEMBLY_SET_STATS table

Column name Description
SET_ACC The assembly set accession + version.
TYPE Statistics for unordered-scaf, all-scaf or aligned-scaf.
KEY The statistics name (e.g. n50).
VALUE The statistics value (e.g. 323391 for n50).

GC_PROKARYOTE table

Column name Description
SET_ACC The assembly set accession + version.
TAX_ID The taxonomic identifier (e.g. 9606 for human) of the sequenced organism.
ORGANISM The scientific name of the sequenced organism.
TAXON_GROUP The taxonomic group of the sequenced organism.
TAXON_SUBGROUP The taxonomic subgroup of the sequenced organism.
PROJECT_ACC The project accession.
SAMPLE_ACC The sample used for the assembly.
SIZE_MB The sequence length (in MB) of the assembly.
GC_CONTENT The percent GC content of the assembly.
WGS_SET The prefix and build of the WGS set in the assembly.
SCAFFOLD_CNT The number of scaffolds in the assembly set.
GENE_CNT The number of gene features in the assembly set.
PROTEIN_CNT The number of protein coding (CDS) features in the assembly set.
ASSEMBLY_STATUS Equivalent to the assembly_level from the gc_assembly_set table: Contig, Scaffold, Chromosome or Complete Genome
CENTER_NAME The name of the submitting centre.
FIRST_PUBLIC The date when the assembly was first made public.
LAST_UPDATED The date when the assembly was last updated.

GC_EUKARYOTE table

Column name Description
SET_ACC The assembly set accession + version.
TAX_ID The taxonomic identifier (e.g. 9606 for human) of the sequenced organism.
ORGANISM The scientific name of the sequenced organism.
TAXON_GROUP The taxonomic group of the sequenced organism.
TAXON_SUBGROUP The taxonomic subgroup of the sequenced organism.
PROJECT_ACC The project accession.
SAMPLE_ACC The sample used for the assembly.
SIZE_MB The sequence length (in MB) of the assembly.
GC_CONTENT The percent GC content of the assembly.
WGS_SET The prefix and build of the WGS set in the assembly.
CHROMOSOME_CNT The number of chromosomes in the assembly set.
ORGANELLE_CNT The number of organelles in the assembly set.
PLASMID_CNT The number of plasmids in the assembly set.
SCAFFOLD_CNT The number of scaffolds in the assembly set.
GENE_CNT The number of gene features in the assembly set.
PROTEIN_CNT The number of protein coding (CDS) features in the assembly set.
ASSEMBLY_STATUS Equivalent to the assembly_level from the gc_assembly_set table: Contig, Scaffold, Chromosome or Complete Genome
CENTER_NAME The name of the submitting centre.
FIRST_PUBLIC The date when the assembly was first made public.
LAST_UPDATED The date when the assembly was last updated.

GC_EUKARYOTE table

Column name Description
TAX_ID The taxonomic identifier (e.g. 9606 for human) of the sequenced organism.
ORGANISM The scientific name of the sequenced organism.
TAXON_GROUP The taxonomic group of the sequenced organism.
TAXON_SUBGROUP The taxonomic subgroup of the sequenced organism.
HOST The taxonomic host group of the sequenced viral organism.
PROJECT_ACC The project accession.
SIZE_KB The sequence length (in kB) of the assembly.
GC_CONTENT The percent GC content of the assembly.
SEGMENT_CNT The number of viral segments in the assembly set.
GENE_CNT The number of gene features in the assembly set.
PROTEIN_CNT The number of protein coding (CDS) features in the assembly set.
ASSEMBLY_STATUS Equivalent to the assembly_level from the gc_assembly_set table: Contig, Scaffold, Chromosome or Complete Genome
FIRST_PUBLIC The date when the assembly was first made public.
LAST_UPDATED The date when the assembly was last updated.

GC_PROJECT table

Column name Description
PROJECT_ACC The project accession.
PROJECT_ID The NCBI project number.
PROJECT_TITLE A short descriptive title of the project.
PROJECT_NAME The project name.
PROJECT_DESCRIPTION A longer description of the project.
PROJECT_TYPE The type of the project: SUBMITTER, ORGANISM, UMBRELLA.
STRAIN The strain.
LOCUS_TAG The locus tag used to identify annotated features.
TAX_ID The taxonomic identifier (e.g. 9606 for human) of the sequenced organism.
SCIENTIFIC_NAME The scientific name of the sequenced organism.
IS_REFSEQ 'Y' if the project is a refseq project or 'N' if not.

BREED The sequenced breed (if applicable).
CULTIVAR The sequenced cultivar (if applicable).
ISOLATE The sequenced isolate (if applicable).

GC_ASSOCIATED_PROJECT table

Column name Description
PROJECT_ACC The project accession. In case of hierarchial link a child project accession.
ASSOCIATED_PROJECT_ACC Associated project accession. In case of hierarchial link a parent project accession.
ASSOCIATION_TYPE The type of the association: HIERARCHIAL, PEER.

GC_PROJECT_LOCUS_TAG table

Column name Description
PROJECT_ACC The project accession.
LOCUS_TAG The locus tag used to identify annotated features.
SAMPLE_ACC The sample accession for the locus tag.
SET_CHAIN The assembly set chain for the locus tag.

Example queries

Select all Embl-Bank/GenBank assemblies for human

select set_acc as "Assembly Accession",
    set_chain as "Assembly Chain",
    set_version as "Assembly Version",
    name as "Assembly Name",           
    decode(is_patch,'N','Major release', 'Minor release') as "Release Type"
from gc_assembly_set where tax_id = 9606 and status_id = 4
order by set_chain, set_version; 

Rows returned 26 Jan 2016:

Assembly Accession Assembly Chain Assembly Version Assembly Name Release Type
GCA_000001405.1 GCA_000001405 1 GRCh37 Major release
GCA_000001405.2 GCA_000001405 2 GRCh37.p1 Minor release
GCA_000001405.3 GCA_000001405 3 GRCh37.p2 Minor release
GCA_000001405.4 GCA_000001405 4 GRCh37.p3 Minor release
GCA_000001405.5 GCA_000001405 5 GRCh37.p4 Minor release
GCA_000001405.6 GCA_000001405 6 GRCh37.p5 Minor release
GCA_000001405.7 GCA_000001405 7 GRCh37.p6 Minor release
GCA_000001405.8 GCA_000001405 8 GRCh37.p7 Minor release
GCA_000001405.9 GCA_000001405 9 GRCh37.p8 Minor release
GCA_000001405.10 GCA_000001405 10 GRCh37.p9 Minor release
GCA_000001405.11 GCA_000001405 11 GRCh37.p10 Minor release
GCA_000001405.12 GCA_000001405 12 GRCh37.p11 Minor release
GCA_000001405.13 GCA_000001405 13 GRCh37.p12 Minor release
GCA_000001405.14 GCA_000001405 14 GRCh37.p13 Minor release
GCA_000001405.15 GCA_000001405 15 GRCh38 Major release
GCA_000001405.16 GCA_000001405 16 GRCh38.p1 Minor release
GCA_000001405.17 GCA_000001405 17 GRCh38.p2 Minor release
GCA_000001405.18 GCA_000001405 18 GRCh38.p3 Minor release
GCA_000001405.19 GCA_000001405 19 GRCh38.p4 Minor release
GCA_000001405.20 GCA_000001405 20 GRCh38.p5 Minor release
GCA_000001405.21 GCA_000001405 21 GRCh38.p6 Minor release
GCA_000002115.2 GCA_000002115 2 WGSA Major release
GCA_000002125.1 GCA_000002125 1 HuRef Major release
GCA_000002125.2 GCA_000002125 2 HuRef Major release
GCA_000002135.1 GCA_000002135 1 CRA_TCAGchr7v1 Major release
GCA_000002135.2 GCA_000002135 2 pre-CRA_TCAGchr7v2 Major release
GCA_000002135.3 GCA_000002135 3 CRA_TCAGchr7v2 Major release
GCA_000004845.1 GCA_000004845 1 YH1 Major release
GCA_000004845.2 GCA_000004845 2 YH_2.0 Major release
GCA_000005465.1 GCA_000005465 1 BGIAF Major release
GCA_000181135.1 GCA_000181135 1 Watson-partial Major release
GCA_000185165.1 GCA_000185165 1 HsapALLPATHS1 Major release
GCA_000212995.1 GCA_000212995 1 HuRefPrime Major release
GCA_000306695.1 GCA_000306695 1 CHM1_1.0 Major release
GCA_000306695.2 GCA_000306695 2 CHM1_1.1 Major release
GCA_000365445.1 GCA_000365445 1 CSA Major release
GCA_000442295.1 GCA_000442295 1 RP11_1.0_unmatched_regions Major release
GCA_000442335.1 GCA_000442335 1 LinearCen1.0 (summed) Major release
GCA_000442335.2 GCA_000442335 2 LinearCen1.1 (normalized) Major release
GCA_000772585.1 GCA_000772585 1 ASM77258v1 Major release
GCA_000772585.2 GCA_000772585 2 ASM77258v2 Major release
GCA_000772585.3 GCA_000772585 3 ASM77258v3 Major release
GCA_000786075.2 GCA_000786075 2 hs38d1 Major release
GCA_000983455.1 GCA_000983455 1 CHM13 Draft Assembly Major release
GCA_000983455.2 GCA_000983455 2 CHM13 Draft Assembly Major release
GCA_000983465.1 GCA_000983465 1 CHM13 Default 5% Error Major release
GCA_000983475.1 GCA_000983475 1 CHM13 CA Conservative 2.5% Error Major release
GCA_001007805.1 GCA_001007805 1 PacBioCHM1_r1_02092014 Major release
GCA_001013985.1 GCA_001013985 1 ASM101398v1 Major release
GCA_001015355.1 GCA_001015355 1 CHM13 CA Sensitive 5% Error Major release
GCA_001015385.1 GCA_001015385 1 CHM13 CA Sensitive 2.5% Error Major release
GCA_001015385.3 GCA_001015385 3 CHM13 CA Sensitive 2.5% Error Major release
GCA_001292825.1 GCA_001292825 1 HS1011_v1 Major release
GCA_001292825.2 GCA_001292825 2 HS1011_v1.1 Major release
GCA_001297185.1 GCA_001297185 1 PacBioCHM1_r2_GenBank_08312015 Major release
GCA_001307015.1 GCA_001307015 1 CA P5+P6 CHM1 Assembly Conservative 2.5% Error Major release
GCA_001307025.1 GCA_001307025 1 CA P6 CHM1 Conservative 2.5% Major release
GCA_001307125.1 GCA_001307125 1 CA P5+P6 CHM1 Assembly Default 5% Error Major release
GCA_001420745.1 GCA_001420745 1 CA P5+P6 CHM1 Assembly Sensitive 2.5% Error Major release
GCA_001420755.1 GCA_001420755 1 CA P5+P6 CHM1 Assembly Sensitive No Breaking 5% Error Major release
GCA_001420765.1 GCA_001420765 1 CA P5+P6 CHM1 Assembly Sensitive No Breaking 2.5% Error Major release
GCA_001421375.1 GCA_001421375 1 CA P5+P6 CHM1 Assembly Sensitive 5% Error Major release

Select all Embl-Bank/GenBank assemblies for GRCh38 assembly chain

select 
    set_acc as "Assembly Accession",
    set_chain as "Assembly Chain",
    set_version as "Assembly Version",
    gc_assembly_set.name as "Assembly Name",
    decode(is_patch,'N','Major release', 'Minor release') as "Release Type"
from gc_assembly_set
where name like 'GRCh38%' and status_id = 4
order by set_chain, set_version;

Rows returned 26 Jan 2016:

Assembly Accession Assembly Chain Assembly Version Assembly Name Release Type
GCA_000001405.15 GCA_000001405 15 GRCh38 Major release
GCA_000001405.16 GCA_000001405 16 GRCh38.p1 Minor release
GCA_000001405.17 GCA_000001405 17 GRCh38.p2 Minor release
GCA_000001405.18 GCA_000001405 18 GRCh38.p3 Minor release
GCA_000001405.19 GCA_000001405 19 GRCh38.p4 Minor release
GCA_000001405.20 GCA_000001405 20 GRCh38.p5 Minor release
GCA_000001405.21 GCA_000001405 21 GRCh38.p6 Minor release

Select all chromosomes for GRCh38 assembly

select 
    name as "Chromosome",
    replicon_type as "Chromosome Type",
    replicon_acc as "Sequence Accession"
from gc_replicon
where set_acc = 'GCA_000001405.15' and replicon_acc is not null;

Rows returned 26 Jan 2016:

Chromosome Chromosome Type Sequence Accession
1 Chromosome CM000663.2
2 Chromosome CM000664.2
3 Chromosome CM000665.2
4 Chromosome CM000666.2
5 Chromosome CM000667.2
6 Chromosome CM000668.2
7 Chromosome CM000669.2
8 Chromosome CM000670.2
9 Chromosome CM000671.2
10 Chromosome CM000672.2
11 Chromosome CM000673.2
12 Chromosome CM000674.2
13 Chromosome CM000675.2
14 Chromosome CM000676.2
15 Chromosome CM000677.2
16 Chromosome CM000678.2
17 Chromosome CM000679.2
18 Chromosome CM000680.2
19 Chromosome CM000681.2
20 Chromosome CM000682.2
21 Chromosome CM000683.2
22 Chromosome CM000684.2
X Chromosome CM000685.2
Y Chromosome CM000686.2
MT Mitochondrion J01415.2