What are Datasets?

Datasets are defined file collections, whose access is governed by a Data Access Committee (DAC).

Total number of Datasets: 3665
Displaying 1 - 3665

Dataset Accession Description Technology Samplessort ascending File Types
EGAD00010001226 UK Biobank directly genotyped dataset Affymetrix 488,377
EGAD00010001225 UK Biobank autosomal imputation dataset Affymetrix 487,409
EGAD00010001192 Germline genotype data on 56,479 ovarian cancer cases and controls Illumina OncoArray 56,479
EGAD00001000623 This VCF contains the full sequence data post QC. This consists of 41,911 individuals. All polymorphic sites are present in this VCF. 41,911 vcf
EGAD00000000029 Aggregate results from a case-control study on stroke and ischemic stroke. 19,602
EGAD00010000738 Generation Scotland APOE data 18,336
EGAD00010000298 All cases and controls (Hap300) 13,761
EGAD00001002246 The T2D-GENES/GoT2D 13K exome sequencing study includes ~13,000 samples, half T2D cases and half T2D controls, from five ancestries (~5K Europeans, ~2K each of African-American, East-Asian, South-Asian, and Hispanic). Samples underwent deep exome sequencing, with SNVs and INDEls called according to GATK best practices; variant sites were then filtered according to the GATK best practices, and then samples and variants underwent further filtering based on aggregate genotype quality as described in Fuchsberger et al. (e.g. low call rate, excess heterozygosity for samples, low call rate or coverage for variants). Please note that one of the samples in the T2D-GENES vcf does not have phenotype data. 13,007 phenotype_file,vcf
EGAD00001002748 DDD DATAFREEZE 2014-11-04: 4293 trios - exome sequence CRAM files 12,548 cram,bam
EGAD00001001848 DDD DATAFREEZE 2014-11-04: 4293 trios - VCF files 12,539 vcf
EGAD00001001977 DDD DATAFREEZE 2014-11-04: 4293 trios - phenotypic and family descriptions 12,539 phenotype_file
EGAD00010000234 WTCCC2 samples from 1958 British Birth Cohort Illumina HumanExome-12v1_A-GenCall, zCall 12,241
EGAD00010000286 All cases and controls (Hap550) Illumina (various) 11,950
EGAD00010000901 Russian Tuberculosis samples using Affymetrix 6.0 Affymetrix Genome-Wide Human SNP Array 6.0 Genotypes 11,937
EGAD00010001158 Genotyping of additional Inflammatory Bowel Disease cases - 2014 (all samples) Illumina Human Core Exome 12v1-1_a 11,767
EGAD00010001081 Summary statistics for Malaria Genomic Epidemiology Network, "A novel locus of resistance to severe malaria in a region of ancient balancing selection", Nature (2015) Illumina Omni 2.5M 11,657
EGAD00000000120 WTCCC2 project Multiple Sclerosis (MS) samples Human670-QuadCustom v1 11,375
EGAD00001002729 Haplotype Reference Consortium Release 1.1 - subset for release via the EGA 11,227 other,vcf,readme_file,tabix,vcf_aggregate
EGAD00010000246 Coeliac disease cases and control samples. (1958BC samples excluded) Illumina ImmunoBeadChip - Illuminus, GenoSNP 10,758
EGAD00010000890 Understanding Society GWAS, all samples Illumina HumanCoreExome-12v1-0 10,463
EGAD00010000292 All cases and Finnish, Dutch, Italian control samples (Hap300) 10,339
EGAD00010000918 Understanding Society GWAS, samples that passed quality control, imputed to UK10K + 1000 Genomes combined reference panel Illumina HumanCoreExome-12v1-0 chip, UK10K + 1000 Genomes combined reference panel imputed 9,944
EGAD00010000891 Understanding Society GWAS, samples that passed quality control Illumina HumanCoreExome-12v1-0 9,944
EGAD00010001157 Genotyping of additional Inflammatory Bowel Disease cases - 2014 (QC pass samples) Illumina Human Core Exome 12v1-1_a 9,247
EGAD00010000858 Achalasia cases & controls 8,151
EGAD00010000662 Finnish population cohort genotyping 7,803
EGAD00010000676 ELSA genome-wide genotypes, including estimated related individuals. There are 3 files: .fam, .bim, .bed 7,452
EGAD00010001246 UK TGCT controls samples using theInfinium OncoArray-500K BeadChip Infinium OncoArray-500K BeadChip 7,422
EGAD00010000674 ELSA genome-wide genotypes, excluding estimated related individuals. There are 3 files: .fam, .bim, .bed 7,412
EGAD00001002014 Isolated populations have unique population genetics characteristics that can help boost power in genetic association studies for complex traits. Leveraging these advantageous characteristics requires an in-depth understanding of parameters that have shaped sequence variation in isolates. This study performs a comprehensive investigation of these parameters using low-depth whole genome sequencing (WGS) across multiple isolates. 6,840 sample_list,vcf
EGAD00010000248 1958BC control samples Illumina ImmunoBeadChip - Illuminus, GenoSNP 6,812
EGAD00000000028 Aggregate results from a GWAS study on 3352 cases abd 3145 controls iSelect Beadchip 6,497
EGAD00010000787 Epigen-Brasil samples using HumanOmni2.5 6,487
EGAD00010000288 All cases and Finnish, Dutch, Italian control samples (Hap550) 6,313
EGAD00000000044 Northern Finland Birth Cohort 1966 samples Illumina HumanHap370 5,844
EGAD00001001319 The aim of this study is to ascertain whether leukaemic mutations exist within the blood of people with otherwise normal haematopoeisis. To satisfy this aim we plan to look for 7 known leukaemic mutations in the whole blood DNA of a large cohort of blood donors who have normal haematopoesis. Genomic regions around mutational sites have been amplified using a 2 step PCR process which involves barcoding of individual patients Illumina MiSeq; 5,817 cram
EGAD00010000902 Genome-wide study of resistance to severe malaria in eleven worldwide populations:Gambia Illumina Omni 2.5M 5,594
EGAD00010000742 Subset 1 of osteoarthritis cases genotyped on Illumina610k from the arcOGEN Consortium (http://www.arcogen.org.uk/) with broader consent. 5,383
EGAD00001003337 T cells isolated from peripheral blood, tumors and adjacent normal tissues from six hepatocellular carcinoma patients. SmartSeq2 and Tang2009 protocol were used to amplify RNA from single T cells. High depth enables simultaneously expression profiling and TCR assembling. Illumina HiSeq 2500 (ILLUMINA), Illumina HiSeq 4000 (ILLUMINA) 5,063
EGAD00010001243 UK TGCT control samples using the Infinium 1.2M array Illumina Infinium 1.2M array 4,946
EGAD00010000950 WTCCC2 Bacteraemia Susceptibility (BS) smaples using Affymetrix 6.0 Affymetrix 6.0 4,924
EGAD00010000965 Array data from 4778 individuals from general population of rural Uganda Illumina HumanOmni2.5-8 BeadChip 4,778
EGAD00001002221 Whole exome sequencing of a subset of participants from the INTERVAL study. Illumina HiSeq 2000; 4,502
EGAD00010001416 BBMRI - BIOS project - Freeze 2 - methylation Illumina Human Methylation 450k BeadChip 4,386
EGAD00010000983 MeDIP-seq RPM chromsome BED files for Peripheral Blood from EPITWIN Project (Columns 4-4353 represent samples) MeDIP-seq 4,350
EGAD00010000874 Understanding Society Sequenom genotypes Sequenom 4,295
EGAD00010000264 WTCCC2 project samples from Ischaemic Stroke Cohort Illumina_670k - Illuminus 4,205
EGAD00010000807 Illumina HumanCoreExome genotyping data from the British Society for Surgery of the Hand Genetics of Dupuytren's Disease consortium (BSSH-GODD consortium) collection 4,201
EGAD00010000282 Pharmacogenomic response to Statins samples (Genotypes/Phenotypes) Affymetrix 6.0 - CHIAMO 4,134
EGAD00010000887 Freeze 1 of the RP3 project Illumina Human Methylation 450k BeadChip 3,898
EGAD00010000904 Genome-wide study of resistance to severe malaria in eleven worldwide populations:Kenya Illumina Omni 2.5M 3,865
EGAD00001000776 UK10K_COHORT_IMPUTATION REL-2012-06-02: imputation reference panel (20140306); Merged UK10K+1000Genomes Phase 3 imputation reference panel added (20160420) Illumina HiSeq 2000; 3,781 other,readme_file,bam
EGAD00000000115 Summary data from GWAS analysis on 856 cases and 2836 control Illumina CytoSNP-12 3,719
EGAD00001003757 BBMRI - BIOS project - Freeze 2 - Fastq files Illumina HiSeq 2000;ILLUMINA 3,686
EGAD00001003758 BBMRI - BIOS project - Freeze 2 - Bam files Illumina HiSeq 2000;ILLUMINA 3,686
EGAD00010000620 Controls 3,683
EGAD00010000618 Ischemic stroke cases 3,682
EGAD00010000602 WTCCC2 Reading and Mathematics ability (RM) samples from UK using the Affymetrix 6.0 array 3,665
EGAD00010001420 Read counts determined using HTSeq-count for the BBMRI BIOS Freeze 2 RNAseq data RNAseq 3,560
EGAD00001003784 BBMRI - BIOS project - Freeze 2 - Bam files - unrelated samples Illumina HiSeq 2000;ILLUMINA 3,559
EGAD00001003785 BBMRI - BIOS project - Freeze 2 - Fastq files - unrelated samples Illumina HiSeq 2000;ILLUMINA 3,559
EGAD00010000570 Imputation-based meta-analysis of severe malaria in Kenya. 3,343
EGAD00001001355 DDD DATAFREEZE 2013-12-18: 1133 trios - VCF files (Ref: DDD Nature 2015) 3,335 readme_file,tab,vcf
EGAD00001001114 DDD DATAFREEZE 2013-12-18: 1133 trios - exome sequence BAM files (Ref: DDD Nature 2015) 3,335 bam,tab
EGAD00001001413 DDD DATAFREEZE 2013-12-18: 1133 trios - README, family trios, phenotypes, validated DNMs (Ref: DDD Nature 2015) 3,335 readme_file,tab
EGAD00001003329 The offspring of first cousin marriages have ~6% of their genome autozygous, i.e. homozygous identical by descent, or even more if there was further consanguinity in their ancestry. In the UK there are large populations with very high first cousin marriage rates of 20-50%. Sequencing the exomes of a sample of these individuals has the potential both to support genetic health programmes in these populations, and to provide genetic research information about rare loss of function mutations. This pilot study based on existing cohort samples from the Born In Bradford study will identify homozygous individuals for almost all variants down to an allele frequency around 1%, plus individuals carrying hundreds of new homozygous rare loss-of-function variants, and will support development of community relations and ethics for a wider study currently being designed. The data deposited in the EGA consist of low coverage whole exome sequencing on these samples. This dataset contains all the data available for this study on 2017-05-11. Illumina HiSeq 2000;ILLUMINA, Illumina HiSeq 2500;ILLUMINA 3,188
EGAD00001002743 These samples comprise both melanoma cases and controls sequenced for a selection of loci linked to disease susceptibility. These bams are a subset of the sequencing restricted specifically to the GRCh37 coding areas of the BAP1 gene. 3,186 bam
EGAD00010000236 WTCCC2 samples from Coronary Artery Disease Cohort - Illuminus, GenoSNP 3,125
EGAD00001002115 Targeted sequencing of 173 genes in 2433 primary breast tumours. Data includes 2433 tumour samples, 523 adjacent normal (breast) samples and 127 blood samples. Libraries were prepared with Illumina's Nextera custom enrichment kit targetting all the exons of the most frequently mutated breast cancer genes. Libraries were multiplexed (48 libraries per lane) and sequenced on Illumina HiSeq 2000 (100bp paired-end reads). Somatic mutations were calling with a custom pipeline. We identified 40 mutation-driver (Mut-driver) genes, and determined associations between mutations, driver CNA profiles, clinical-pathological parameters and survival. We assessed the clonal states of Mut-driver mutations, and estimated levels of intra-tumour heterogeneity using mutant-allele fractions. The results emphasize the importance of genome-based stratification of breast cancer, and have important implications for designing therapeutic strategies. Referece: Pereira et al. (2016) The somatic mutation profiles of 2,433 breast cancers refines their genomic and transcriptomic landscapes. Nature Communications 3,083 bam,bai
EGAD00010000926 Subset 1 of osteoarthritis cases from the arcOGEN Consortium (http://www.arcogen.org.uk/) genotyped on HumanCoreExome-12v1-1 with broader consent. Illumina HumanCoreExome-12v1-1 3,075
EGAD00010000250 NBS control samples Illumina ImmunoBeadChip - Illuminus, GenoSNP 3,030
EGAD00010000262 WTCCC2 project Schizophrenia (SP) samples Affyemtrix 6.0 - CHIAMO 3,019
EGAD00001003513 This dataset includes bam files from 3,001 samples. These bam files include all read pairs where at least one of the reads aligns within 1kb of the C9orf72 repeat expansion. Additionally, these bam files also contain reads that are aligned to any of 29 pre-determined off target locations where the aligners are known to mis-align reads associated with this repeat expansion. These samples were sequenced using a combination of 2x100bp reads on an Illumina HiSeq2000 and 2x150bp reads on an Illumina HiSeqX sequencer and aligned using the Isaac aligner. HiSeq X Ten;ILLUMINA, Illumina HiSeq 2000;ILLUMINA 3,001
EGAD00000000022 WTCCC2 project samples from 1958 British Birth Cohort Illumina 1.2M 3,000
EGAD00000000021 WTCCC2 project samples from 1958 British Birth Cohort Affymetrix 6.0 3,000
EGAD00010000929 WTCCC3_Primary Biliary Cirrhosis Replication Illumina ImmunoChip 2,981
EGAD00010000232 WTCCC2 samples from Type 2 Diabetes Cohort - Illuminus 2,975
EGAD00010000230 WTCCC2 samples from Hypertension Cohort - Illuminus 2,943
EGAD00010000634 WTCCC2 People of the British Isles (POBI) samples using Affymetrix 6.0 array 2,930
EGAD00001000401 Population based sequencing of whole genomes of Crohn's disease patients. Illumina HiSeq 2000 (ILLUMINA) 2,926
EGAD00010000632 WTCCC2 People of the British Isles (POBI) samples using Illumina 1.2M array 2,912
EGAD00010001289 Resolving the Genetic Architecture of Aseptic Loosening After Total Hip Replacement Illumina InfiniumCoreExome-24v1-1_A 2,880
EGAD00001002247 The GoT2D study includes ~2800 samples, half T2D cases and half T2D controls, of Northern European ancestry sequenced over 3 three technologies: deep whole exome sequencing, low-pass (4x) whole genome sequencing, and OMNI 2.5M genotyping. Samples were ascertained to be phenotypically "extreme" (e.g. leaner, younger cases and older, more obese controls). Genotypes (SNVs, INDELs, and SVs) were called separately for each technology and then integrated via genotype refinement into a single phased reference panel; samples and variants were then excluded based on QC procedures described in Fuchsberger et al. Please note that 2 of the samples in the GoT2D vcf do not have phenotype data. 2,872 phenotype_file,vcf
EGAD00010000572 Imputation-based meta-analysis of severe malaria in Gambia. 2,870
EGAD00000000025 WTCCC2 project Ulcerative Colitis (UC) samples Affymetrix 6.0 2,869
EGAD00010000928 WTCCC3_Primary Biliary Cirrhosis Replication Post-QC Illumina ImmunoChip 2,861
EGAD00010001428 Cardio-Metabochip genotypes for IHIT cohort Illumina 2,791
EGAD00010000584 WTCCC2 Glaucoma samples using Illumina 670k array 2,765
EGAD00001000747 Genomic libraries will be generated from total genomic DNA derived from 4000 samples with Acute Myeloid Leukaemia. Libraries will be enriched for a selected panel of genes using a bespoke pulldown protocol. 64 Samples will be individually barcoded and subjected to up to one lanes of Illumina HiSeq. Paired reads will be mapped to build 37 of the human reference genome to facilitate the characterisation of known gene mutations in cancer as well as the validation of potentially novel variants identified by prior exome sequencing. Illumina HiSeq 2000; 2,734 cram
EGAD00000000058 Aggregate results from 22 Carbamazepine-induced hypersensitivity syndrome patients and 2691 UK National Blood Service (NBS) control samples Illumina Infinium 1.2M 2,713
EGAD00001001079 The offspring of first cousin marriages have ~6% of their genome autozygous, i.e. homozygous identical by descent, or even more if there was further consanguinity in their ancestry. In the UK there are large populations with very high first cousin marriage rates of 20-50%. Sequencing the exomes of a sample of these individuals has the potential both to support genetic health programmes in these populations, and to provide genetic research information about rare loss of function mutations. This pilot study based on existing cohort samples from the Born In Bradford study will identify homozygous individuals for almost all variants down to an allele frequency around 1%, plus individuals carrying hundreds of new homozygous rare loss-of-function variants, and will support development of community relations and ethics for a wider study currently being designed. The data deposited in the EGA consist of low coverage whole exome sequencing on these samples.Data Access is controlled by the Wellcome Trust Sanger Institute DAC and the Born In Bradford Executive Group. This dataset contains all the data available for this study on 2014-11-20. Illumina HiSeq 2000; 2,702 vcf,cram
EGAD00010000124 Psoriasis cases as part of WTCCC2 phase 2 Illumina_670k - Illuminus 2,622
EGAD00000000030 T1DGC project 1958 British Birth Cohort samples Illumina HumanHap 550 2,604
EGAD00001001014 Illumina HiSeq 2000; 2,597 bam
EGAD00010000284 NBS control samples only (Hap300) Illumina (Various) 2,500
EGAD00000000121 Genotypes at MITF E318K variant Taqman and sequencing 2,488
EGAD00010000294 1958BC control samples only (Hap300) 2,436
EGAD00010000750 German glioma control germline genotypes using Illumina HumanExome-12v1_A array Illumina HumanExome-12v1_A 2,391
EGAD00010000744 Subset 2 of osteoarthritis cases genotyped on Illumina 610k from the arcOGEN Consortium (http://www.arcogen.org.uk/) with consent for osteoarthritis studies only. 2,326
EGAD00001000740 UK10K_COHORT_ALSPAC REL-2012-06-02: Low-coverage whole genome sequencing; variant calling, genotype calling and phasing Illumina HiSeq 2000 (ILLUMINA) 2,320
EGAD00010000290 NBS control samples only (Hap550) 2,276
EGAD00010000296 1958BC control samples only (Hap550) 2,224
EGAD00001002727 1,591 single cells from 11 colorectal cancer patients were profiled using Fluidigm based single cell RNA-seq protocol to characterized cellular heterogeneity of colorectal cancer. 630 single cells from 7 cell lines were profiled similarly to benchmark de novo cell type identification algorithms. Illumina HiSeq 3000;ILLUMINA 2,221
EGAD00001001622 BBMRI - BIOS project - Freeze 1 - Fastq files Illumina HiSeq 2000; 2,199 fastq
EGAD00010000604 DNA methylation data using Illumina 450K 2,195
EGAD00010000162 Illumina HT 12 IDATS Illumina HT 12 2,136
EGAD00001001623 BBMRI - BIOS project - Freeze 1 - Bam files 2,117 bam,contig_fasta
EGAD00000000043 GenomeEUtwin control samples Illumina HumanHap300-Duo, Illumina HumanHap 550K 2,099
EGAD00000000005 WTCCC1 project Inflammatory Bowel Disease (IBD) samples Affymetrix 500K 2,005
EGAD00010000150 WTCCC2 project samples from Ankylosing spondylitis Cohort Illumina_670k - Illuminus 2,005
EGAD00000000006 WTCCC1 project Hypertension (HT) samples Affymetrix 500K 2,001
EGAD00000000008 WTCCC1 project Type 1 Diabetes (T1D) samples Affymetrix 500K 2,000
EGAD00001001639 Low depth (4x) Illumina HiSeq raw sequence data for 2000 Ugandans from various ethno-linguistic group from rural South-West Uganda (related individuals included). Illumina HiSeq 2000; 2,000 bam,cram
EGAD00000000009 WTCCC1 project Type 2 Diabetes (T2D) samples Affymetrix 500K 1,999
EGAD00000000007 WTCCC1 project Rheumatooid arthritis (RA) samples Affymetrix 500K 1,999
EGAD00000000004 WTCCC1 project Coronary Artery Disease (CAD) samples Affymetrix 500K 1,998
EGAD00000000003 WTCCC1 project Bipolar Disorder (BD) samples Affymetrix 500K 1,998
EGAD00010000164 Affymetrix 6.0 CEL files Affymetrix SNP 6.0 1,992
EGAD00001000409 2000 ulcerative colitis cases drawn from the UKIBD Genetics Consortium cohort and whole-genome sequenced at 2X depth. A case control association study using control samples whole-genome sequenced by UK10K will be undertaken to identify common, low-frequency and rare variants associated with ulcerative colitis. Data will be combined with similar data across 3000 Crohn's disease cases from the same cohort to identify inflammatory bowel disease (IBD) loci and better understand the genetic differences and similarities of the two common forms of IBD. Illumina HiSeq 2000; 1,992 bam
EGAD00010000506 WTCCC2 BO (Barretts oesophagus) samples Illumina_670k-Illuminus 1,991
EGAD00010000854 WTCCC3 UK maternal cases of pre-eclampsia Illumina Human670-QuadCustom_v1 1,990
EGAD00001000253 AML targeted resequencing study Illumina HiSeq 2000; 1,972 bam
EGAD00001000789 UK10K_COHORT_ALSPAC REL-2012-06-02: Phenotype data 1,927 phenotype_file,readme_file
EGAD00000000122 Genotypes at MITF E318K variant Illumina HumanHap 300 v2 Duo, Illumina HumanCNV370, Illumina Human660W-Quad 1,925
EGAD00001001236 Targetted capture and resequencing of 94 known myeloid genes across MPN trials (PT1 and Voriconazole study) and other MPN samples. Illumina HiSeq 2000; 1,860 cram
EGAD00001000741 UK10K_COHORT_TWINSUK REL-2012-06-02: Low-coverage whole genome sequencing; variant calling, genotype calling and phasing Illumina Genome Analyzer II;, Illumina HiSeq 2000; 1,854
EGAD00001000790 UK10K_COHORT_TWINSUK REL-2012-06-02: Phenotype data 1,854 phenotype_file,readme_file
EGAD00001003176 For each subject, genomic DNA from whole blood, circulating cell free DNA and tumor tissues (whenever possible) were performed targeting next generation sequencing on Illumina Miseq or Hiseq 4000 platforms. The sequencing results of whole blood were used to distinguish germline and somatic mutations. Specimens were collected from patients with different kinds of solid tumors, but most are lung cancer patients. Illumina MiSeq;ILLUMINA, Illumina HiSeq 4000;ILLUMINA 1,845
EGAD00001003600 Exome sequencing data for 1001 DLBCL patients and RNA sequencing data for 775 DLBCL patients Illumina HiSeq 2500;ILLUMINA 1,776
EGAD00010000598 PCGP Ph-likeALL SNP6 1,724
EGAD00010000594 SCOOP severe early-onset obesity cases 1,720
EGAD00001000194 UK10K_COHORT_TWINS REL-2011-12-01 Illumina Genome Analyzer II;, Illumina HiSeq 2000; 1,713 vcf
EGAD00000000060 Samples from the UK Glomerulonephritis DNA bank Illumina 610K Quad, Illumina Hap300 1,705
EGAD00000000057 WTCCC project samples from the Parkinson's disase cohort Illumina 610K Quad 1,705
EGAD00000000056 WTCCC project samples from the primary biliary cirrhosis cohort Illumina 610K Quad 1,705
EGAD00010000536 21 unlinked autosomal microsatellite loci for 30 Central Asian populations Applied Biosystems 3100 automated sequencer-GeneMarker v.1.6 (Softgenetics) 1,702
EGAD00010000538 28 unlinked autosomal microsatellite loci for 20 African and 4 philippine populations Applied Biosystems 3100 automated sequencer-GeneMarker v.1.6 (Softgenetics) 1,702
EGAD00001003215 This data set contains whole exome sequences of individuals with self-stated parental relatedness from the East London Genes & Health cohort. Rare frequency functional variants in these healthy individuals will be studied with respect to the genetic health of the participants and loss-of-function analysis of human genes. Illumina HiSeq 2000;ILLUMINA, Illumina HiSeq 2500;ILLUMINA 1,702
EGAD00010001034 WTCCC3 Anorexia Nervosa GWAS Illumina Human670-QuadCustom_v1_A 1,696
EGAD00010001155 Crohn's disease DNA samples genotyped using UK Biobank Axiom array Axiom UKB 1,676
EGAD00001000656 FACS phenotype of 1629 Sardinian samples 1,629 phenotype_file
EGAD00010000872 Genotyped case and control sampes using HumanExome Beadchip 1,610
EGAD00010000940 Gambian specimens with trachomatous scarring WHO grade C2/C3 Illiumina Omni 2.5 1,531
EGAD00010000941 Gambian specimens without trachomatous scarring Illumina Omni 2.5 1,531
EGAD00000000001 WTCCC1 project samples from 1958 British Birth Cohort Affymetrix 500K 1,504
EGAD00000000014 WTCCC1 project samples from 1958 British Birth Cohort Illumina 15K 1,504
EGAD00010001004 WTCCC1 project samples from 1958 British Birth Cohort Infinium 550K 1,504
EGAD00000000002 WTCCC1 project samples from UK National Blood Service Affymetrix 500K 1,500
EGAD00000000016 WTCCC project Tuberculosis (TB) samples Affymetrix 500K 1,498
EGAD00000000015 WTCCC project African control samples Affymetrix 500K 1,496
EGAD00001001897 15x whole genome sequencing in samples from the Cretan Greek isolate collection HELIC MANOLIS HiSeq X Ten; 1,482 cram
EGAD00010000438 Normalized miRNA expression data Agilent ncRNA 60k 1,480
EGAD00010000444 Agilent ncRNA 60k txt files Agilent ncRNA 60k 1,480
EGAD00010000202 Case samples (Illumina_660K & Illumina_670K) Illumina_660K/Illumina_670K 1,478
EGAD00010001395 A replication cohort consisting of 1428 adult survivors of any non-ALL pediatric cancer Genome-Wide Human SNP Array 6.0 - Thermo Fisher Scientific 1,428
EGAD00001002212 Non-syndromic cases of congenital heart defects (CHD) exhibit variable modes of inheritance (Mendelian and non-Mendelian). Several studies have identified strong candidates in humans by taking a candidate gene approach as well as by using whole exome next generation sequencing (NGS). So far these studies could only explain a minor fraction of the observed phenotype in humans, most of them in syndromic cases and no single study has focused on the subset of cases with left ventricular outflow tract obstruction (LVOTO). To discover novel disease-causing genes a large cohort of patients with LVOTO, approximately 100 cases, 25 families and 100 trios have been exome sequenced. This study based on NGS sequencing data yielded several known and novel compelling candidate genes, such as MYH6, NR2F2 and MYH11, but also novel ones, such as ITGB4. To evaluate the significance of our findings in a replication cohort we assembled another 1614 cases with an LVOTO phenotype from our collaborators in Toronto, Berlin and Amsterdam. Targeted resequencing in this additional cohort will help to find additional cases with mutations in the identified candidate genes to strengthen genotype-phenotype association. We will use control data from the INTERVAL project for case/control analyses The pulldowns will be performed as 24-plex ISC with 192 or greater indexes, and the sequencing will be performed with 192 samples per lane, requiring 9 lanes of sequencing. Illumina HiSeq 2000; 1,376 cram
EGAD00000000039 NcOEDG Malmo - Lund samples Affymetrix 500K 1,374
EGAD00010000522 Samples from the Greek island of Crete, MANOLIS cohort HumanOmniExpress-12 v1.1 BeadChip-GenCall 1,364
EGAD00000000059 Aggregate results from 43 Carbamazepine-induced hypersensitivity syndrome patients and 1296 1958 British Birth Cohort control samples Affymetrix 500K, Illumina 610K Quad 1,339
EGAD00001001432 PCGP Germline Study Whole Genome Sequencing Illumina HiSeq 2000;ILLUMINA, Illumina HiSeq 2000; 1,337 bam
EGAD00010001427 Cardio-Metabochip genotypes for B99 cohort Illumina 1,336
EGAD00001003269 High-coverage WGS sequencing of DNA samples from 90pairs GCs was performed on the Illumina HiSeq X Ten System. Illumina HiSeq 2000;ILLUMINA 1,332
EGAD00010000442 Affymetrix SNP 6.0 CEL files Affymetrix_SNP6_raw 1,302
EGAD00010000434 Normalised mRNA expression Illumina HT 12 1,302
EGAD00010000436 Illumina HT 12 IDAT files Illumina HT 12 1,302
EGAD00010000440 Segmented copy number data Affymetrix_SNP6_raw 1,302
EGAD00010000518 Samples from the Greek island of Crete, MANOLIS cohort HumanExome_12v1.1_A -GenCall, zCall 1,280
EGAD00010001209 Genome-wide SNP genotyping data for 1,235 western Africans by Illumina HumanOmniExpress-12 array, used in the EGAS00001002078 study Illumina HumanOmniExpress-12 1,235
EGAD00010000650 Genotypes from Omni2.5 chip 1,213
EGAD00010001212 Genetic studies of pregnancy-related cardiometabolic disorders in Central Asian, Northern European, and Colombian populations Illumina HumanOmniExpress-12v1_J 1,207
EGAD00001000618 1204 Sardinian males 1,195 bam
EGAD00001001025 The offspring of first cousin marriages have ~6% of their genome autozygous, i.e. homozygous identical by descent, or even more if there was further consanguinity in their ancestry. In the UK there are large populations with very high first cousin marriage rates of 50-80%. Sequencing the exomes of a sample of these individuals has the potential both to support genetic health programmes in these populations, and to provide genetic research information about rare loss of function mutations. This pilot study based on existing British-Pakistani cohort samples from Birmingham will identify homozygous individuals for almost all variants down to an allele frequency around 1%, plus individuals carrying hundreds of new homozygous rare loss-of-function variants, and will support development of community relations and ethics for a wider study currently being designed. The data deposited in the EGA consist of low coverage whole exome sequencing on these samples. Illumina HiSeq 2000; 1,156 bam,cram
EGAD00001000750 UK10K_RARE_FIND REL-2013-10-31 variant calling Illumina HiSeq 2000; 1,151 tabix,vcf
EGAD00001001991 Meta-genomic sequencing of 1,200 LifeLines-DEEP participants Illumina HiSeq 2000; 1,135 bam
EGAD00001000288 Invasive lobular carcinoma (ILC) is the second most common histological subtype of breast cancer accounting for 10-15% of cases. ILC differs from invasive ductal carcinoma (IDC)with respect to epidemiology, histology, and clinical presentation. Moreover, ILC is less sensitive to chemotherapy, more frequently bilateral, and more prone to form gastrointestinal, peritoneal, and ovarian metastases than IDCs. In contrast to IDC, the prognostic value of histological grade (HG) in ILC is controversial. One of the three major components of histological grading (tubule formation) is missing in ILC which hinders the process of grading in this histological subtype and results in the classification of approximately two thirds of ILC as HG 2. Over the last decade, a number of gene expression signatures have shed light onto breast cancer classification, allowing breast cancer care to become more personalized. With respect to the management of estrogen receptor (ER)-positive breast cancer, several gene expression signatures provide prognostic and/or predictive information beyond what is possible with current classical clinico-pathological parameters alone. Nevertheless, most studies using gene expression signature have not considered different histologic subtypes separately. Recently, a comprehensive research program has elucidated some of the biological underpinnings of invasive lobular carcinoma. Genetic material extracted from 200 ILC tumor samples were studied using gene expression profiling and identified ILC molecular subtypes. These proliferation-driven gene signatures of ILC appear to have prognostic significance. In particular, the Genomic Grade (GG) gene signature improved upon HG in ILC and added prognostic value to classic clinico-pathologic factors. In addition this study demonstrated that most ILC are molecularly characterized as luminal-A (~75%)followed by luminal-B (~20%) and HER2-positve tumors (~5%). Moreover, we investigated the prognostic value of known gene signatures/ gene modules in the same cohort of ILC. As a second step within the scope of this project, we aim to investigate the interactions between somatic ILC tumor mutations to observed transcriptome findings. To this end, we aim to perform somatic mutation analysis for the ILC tumors for which Affymetrix gene expression profiling is available. To this end, we will use a gene screen assay, which specifically interrogates the mutational status of a few hundreds of cancer genes. We believe that this pioneering effort will be fundamental for a tailored treatment of ILC with improvement in patients' outcome. Illumina HiSeq 2000; 1,130 bam,cram
EGAD00010001294 Methylation data using 450K Illumina 450k 1,128
EGAD00001001943 Here, we studied well-phenotyped individuals from the Flemish Gut Flora Project (FGFP, N=1,106, Belgium) and the effect of environments on microbiome. The 69 major significant phenotypes found in this study are provided. 1,106
EGAD00001001039 Genomic characterisation of a large series of cancer cell lines. Illumina HiSeq 2000; 1,072 bam,cram
EGAD00001001936 Firs 1106 16S rDNA data for the Flemish Gut Flora Project Illumina MiSeq; 1,061
EGAD00010000516 Samples from the Pomak Villages in Greece, Pomak isolate HumanExome_12v1.1_A -GenCall, zCall 1,046
EGAD00001000652 Pulldown experiments will be performed on a number of patients with Myeloproliferative Neoplasms (MPN). The pulldown will be a bespoke design targeting known mutations, this pulldown will be sequenced and analysed to inform prevalence of mutations and to inform to the possibility of use as a diagnostic tool. Illumina HiSeq 2000; 1,036 bam
EGAD00010000554 SNP 6.0 arrays of small cell lung cancer 1,032
EGAD00001002715 Exome sequencing of isolate populations and Generation Scotland Illumina HiSeq 2000;ILLUMINA 1,027 bam
EGAD00010000644 Affymetrix SNP6.0 cancer cell line exome sequencing data 1,022
EGAD00001003453 16S sequencing of stool samples of LifeLines-DEEP, domain V4 Illumina MiSeq;ILLUMINA 1,010
EGAD00010000875 CLL Expression Array Affymetrix U219 1,008
EGAD00001002204 1006 Familial early onset gemrline CRC patients sequenced by the Molecular and Population Genetics group of the Institute of Cancer Research Illumina HiSeq 2500; 1,006 bam
EGAD00000000013 WTCCC1 project Breast cancer (BC) samples Illumina 15K 1,004
EGAD00001001637 Whole-genome sequencing at 1x of samples from the Cretan Greek isolate collection HELIC-MANOLIS. Genome-wide association studies of complex traits have been successful in identifying common variant associations, but a substantial heritability gap remains. The field of complex trait genetics is shifting towards the study of low frequency and rare variants, which are hypothesised to have larger effects. The study of these variants can be empowered by focusing on isolated populations, in which rare variants may have increased in frequency and linkage disequilibrium tends to be extended. This work focuses on an isolated population from Crete, Greece. Sequencing is very efficient in isolated populations, because variants found in a few samples will be shared by others in extended haplotype contexts, supporting accurate imputation. Illumina HiSeq 2000; 1,003 bam,cram
EGAD00010000158 Affymetrix 6.0 cel files Affymetrix SNP 6.0 1,001
EGAD00010000160 Illumina HT 12 IDATS Illumina HT 12 1,001
EGAD00001001021 Exome sequencing of 1000 samples from the UK 1958 Birth Cohort. DNA library preps prepared with Illumina TruSeq sample preparation kit. The captured DNA libraries were PCR amplified using the supplied paired-end PCR primers. Sequencing was performed with an Illumina HiSeq2000 (SBS Kit v3, one pool per lane) generating 2x101-bp reads. Illumina HiSeq 2500; 1,000 fastq
EGAD00010000217 Segmented (HMM) copy number aberrations (CNA); discovery set Affymetrix SNP 6.0 997
EGAD00010000213 Segmented (CBS) copy number aberrations (CNA); discovery set Affymetrix SNP 6.0 997
EGAD00010000210 Normalized expression data; discovery set Illumina HT 12 997
EGAD00010000214 Segmented (CBS) copy number variants (CNV); discovery set Affymetrix SNP 6.0 997
EGAD00010000215 Segmented (CBS) copy number aberrations (CNA); validation set Affymetrix SNP 6.0 995
EGAD00010000211 Normalized expression data; validation set Illumina HT 12 995
EGAD00010000216 Segmented (CBS) copy number variants (CNV); validation set Affymetrix SNP 6.0 995
EGAD00001000871 The purpose of this study is to sequence 500 known cancer genes in 960 newly diagnosed high risk breast cancer patients treated with current standard of care therapies and trastuzumab, for somatic alteration and copy number changes. We will be using next gen sequencing technology to determine the prognostic relevance of these somatic genetic alterations and of teh low frequency events to determine if they are associated with trastuzumab benefit or HER2 positive breast cancer, i.e. treatment interaction. The samples will be analysed adn correlated with clinical variables including outcome. Illumina HiSeq 2000; 993 cram
EGAD00010000924 Subset 2 of osteoarthritis cases from the arcOGEN Consortium (http://www.arcogen.org.uk/) genotyped on HumanCoreExome-12v1-1 with consent for osteoarthritis studies only. Illumina HumanCoreExome-12v1-1 991
EGAD00010001255 Autosomal STR genotypes using 15 Identifiler loci Applied Biosystems 990
EGAD00001000432 UK10K_OBESITY_SCOOP REL-2013-04-20 Illumina HiSeq 2000; 985 vcf
EGAD00000000012 WTCCC1 project Multiple Sclerosis (MS) samples Illumina 15K 975
EGAD00010001258 Human Cardio Metabochip Illumina 973
EGAD00001002714 We recruited 100 healthy, male donors of self-reported European descent (EUB) and 100 of self-reported African descent (AFB) (Ghent, Belgium). For each participant, peripheral blood mononuclear cells (PBMCs) were isolated from whole blood on Ficoll-Paque density gradients. Monocytes were then positively selected with magnetic CD14 microbeads and exposed for 6 hours to different ligands activating TLR4 (LPS), TLR1/2 (Pam3CSK4), TLR7/8 (R848) and to a human seasonal influenza A virus (IAV). High-quality RNA was obtained from unstimulated and stimulated monocytes for 970 of the 1000 samples (200 x 5 conditions), and was sequenced on an Illumina HiSeq2000. On average, 34 million 101-bp single-end reads were obtained per sample. Illumina HiSeq 2000;ILLUMINA 970 fastq
EGAD00001001251 Low coverage (4-6x) sequencing on samples from population cohorts (Finrisk, Health2000) will be done at Wellcome Trust Sanger Institute (WTSI) using Illumina HiSeq sequencing technology. We will produce 100bp paired end reads. Variants will be called using the 1000 Genomes Project pipeline. The samples have been selected from a national representative set of approximately 30,300 samples and comprises 500 individuals of each gender in the extreme tail of high density lipoprotein (HDL) concentrations. Included individuals were between 25 and 65 years of age. Individuals with a diagnosis of diabetes or BMI>30 were excluded from the study. Illumina HiSeq 2000; 966 bam
EGAD00001000657 DATA FILES FOR Histone Capture bams Illumina HiSeq 2000; 962 bam
EGAD00000000010 WTCCC1 project Ankylosing Spondylitis (AS) samples Illumina 15K 957
EGAD00010000953 Healthy adult volunteers and newborns recruited in various countries across Oceania. HumanCore-24 BeadChip 937
EGAD00001001358 463 newly diagnosed patients from the UK Myeloma XI clinical trial (NCT01554852) underwent whole exome sequencing plus targeted capture of the IGH/K/L and MYC loci. 200 ng of DNA were processed using NEBNext DNA library prepartion kit and hybridised to the SureSelect Human All Exon V5 Plus. Four samples were pooled and run on one lane of a HiSeq 2000 using 76-bp paired end reads. DNA from CD138+ selected bone marrow cells (myeloma tumour) as well as peripheral white blood cells were analysed and somatic mutations detected. Illumina HiSeq 2000; 926 bam
EGAD00010001043 WTCCC3 Anorexia Nervosa Infinium-HumanCoreExome Illumina HumanCoreExome-12v1-0_A and HumanCoreExome-24v1-0_A 925
EGAD00010001412 Blood transcriptome from women participating in the Norwegian Women and Cancer study (NOWAC) Illumina HumanWG-6 version 3 or Illumina HumanHT-12 expression bead chip, combined on identical nucleotide universal identifier 920
EGAD00010001323 Medulloblastoma methylation profiling Illumina Infinium HumanMethylation450 BeadChip 911
EGAD00001000624 Multifocality or multicentricity in breast cancer may be defined as the presence of two or more tumor foci within a single quadrant of the breast or within different quadrants of the same breast, respectively. This original classification of the breast cancer as multicentric or multifocal was based on the assumption that cancers arising in the same quadrant were more likely to arise from the same ductal structures than those occurring in separate areas of the breast. The problem with these definitions is that the ?quadrants? of the breast are arbitrary external designations, as no internal boundaries do exist. This project will therefore focus both on synchronous multifocal and multicentric tumors. The incidence of multifocal and multicentric breast cancers was reported to be between 13 and 75% depending on the definition used, the extent of the pathologic sampling of the breast and whether in situ disease is considered evidence of multicentricity (1). Although this incidence is variable, those figures show that it is a frequent phenomenon. Multiple (multifocal/multicentric) breast carcinomas, especially when occurring in the same breast, represent a real challenge for both pathologists and clinicians in terms of identifying the cellular origin and the best therapeutic management of the cancer. Multifocality or multicentricity has been associated with a number of more aggressive features including an increased rate of regional lymph node metastases and adverse patient outcome when compared with unifocal tumors (2-3), and a possible increased risk of local recurrence following breast conserving surgery (4). For the moment, the literature is divided on whether there is a corresponding impact on survival outcomes. Today, the current convention to stage and to treat multifocal and multicentric tumors is the classical tumor-node-metastasis (TNM) staging guidelines with which tumor size is assessed by the largest tumor focus without taking other foci of disease into consideration. If some papers, as the recent one from Lynch and colleagues, support the current staging convention (3), others, however, as Boyages et al. suggested that aggregate size and not the size of the largest lesion should be considered in order to refine the prognostic assessment of those tumors (5). On the top of that, the question whether multifocal/multicentric carcinomas are due to the spread of a single carcinoma throughout the breast or is due to multiple carcinomas arising simultaneously has been a matter of debate. Some studies suggested that multifocal breast cancer may result from either intramammary spread from a single primary tumor or multiple synchronous primary tumors; whereas others suggest that multiple breast carcinomas always arise from the same clone (6-8). Recently, Pietri and colleagues analyzed the biological characterization of a series of 113 multifocal/multicentric breast cancers (8) which were diagnosed over a 5-year period. The expression of estrogen (ER) and progesterone (PgR) receptors, Ki-67 proliferative index, expression of HER2 and tumor grading were prospectively determined in each tumor focus, and mismatches among foci were recorded. Mismatches in ER status were present in 5 (4.4%) cases and PgR in 18 (15.9%) cases. Mismatches in tumor grading were present in 21 cases (18.6%), proliferative index (Ki-67) in 17 (15%) cases and HER2 status in 11 (9.7%) cases. Interestingly, this heterogeneity among foci has led to 14 (12.4%) patients receiving different adjuvant treatments compared with what would have been indicated if we had only taken into account the biologic status of the primary tumor. This study therefore showed that differences in biological characteristics of multifocal/multicentric lesions play a crucial role in the adjuvant treatment decision making process. In this study, we will concentrate on a larger series of patients with multifocal invasive ductal breast cancer lesions. We aim at: 1. Evaluating the incidence of multifocality according to the different breast cancer molecular subtypes (ER-/HER2-, HER2+, ER+/HER2-). 2. Evaluating the incidence of multifocality in patients with hereditary breast cancer disease (presence of germline BRCA1 or BRCA2 mutations). Moreover, we would like to investigate if multifocal lesions with BRCA1 or BRCA2 mutations exhibit a characteristic combination of substitution mutation signatures and a distinctive profile of deletions as demonstrated recently by Nik-Zainal and colleagues (9). 3. Correlating multifocality with clinical information in order to define its influence on patients? survival (DFS and OS). 4. Carrying high coverage targeted gene sequencing of driver cancer genes and genes whose mutation is of therapeutic importance in order to compare clinically-relevant genetic differences between several multifocal breast cancer lesions. 5. Evaluating the impact of the distance between the different lesions on the clinical outcome but also on the genetic differences. 6. Comparing gene expression patterns between several multifocal breast cancer lesions and correlate them with the results of the targeted genes screen. 7. Characterizing the genomic and transcriptomic status of cancer related genes in metastatic lesions (local recurrence, positive lymph node or distant metastatic sites) from the same multifocal invasive ductal breast cancer patients in order to evaluate the consequence of genomic and transcriptomic heterogeneity of multifocal lesions on metastatic lesions. Multiple (multifocal/multicentric) breast carcinomas, especially when occurring in the same breast, represent a real challenge for both pathologists and clinicians in terms of identifying the cellular origin and the best therapeutic choice. This project has the potential to identify genetic/transcriptomic differences existing between several lesions constituting multifocal breast cancers, which in the routine clinical practice are usually considered to be homogeneous among them. We foresee validating significant results in a larger series of patients and this, in turn, could have a remarkable impact on the treatment and clinical management of multifocal breast cancers. Indeed, we hope to provide some evidence whether or not each focus matters in multifocal and multicentric breast cancer to define the adequate therapeutic approach, especially in the context of targeted therapies. The work to be done at Sanger will be target gene screen pooling of 1400 samples. Illumina HiSeq 2000; 908 bam,cram
EGAD00010000758 French glioma case germline genotypes using Illumina HumanExome-12v1_A array Illumina HumanExome-12v1_A 906
EGAD00001001433 PCGP Germline Study Whole Exome Sequencing Illumina HiSeq 2000;ILLUMINA, Illumina HiSeq 2000; 906 bam
EGAD00010000766 We have established a mechanism for the collection of postal DNA samples from consenting National Joint Registry for England and Wales (NJR) patients and have carried out genotyping genome-wide in 903 patients with the condition Developmental Dysplasia of the Hip (DDH) on the Illumina CoreExome array 903
EGAD00000000011 WTCCC1 project Autoimmune Thyroid Disease (ATD) samples Illumina 15K 900
EGAD00010000752 German glioma case germline genotypes using Illumina HumanExome-12v1_A array Illumina HumanExome-12v1_A 899
EGAD00001001464 Exome Sequencing. 3 ?g of genomic DNA from each sample were sheared and used for the construction of a paired-end sequencing library as described in the paired-end sequencing sample preparation protocol provided by Illumina41. Enrichment of exonic sequences was then performed for each library using either the Sure Select Human All Exon 50 Mb or All Exon+UTRs v4 kits following the manufacturer’s instructions (Agilent Technologies). Exon-enriched DNA was pulled down by magnetic beads coated with streptavidin (Invitrogen), followed by washing, elution and 18 additional cycles of amplification of the captured library. Enriched libraries were sequenced (2 × 76 bp) in one lane of an Illumina GAIIx sequencer or in two lanes of a HiSeq2000 when using pools of eight samples. 882 bam
EGAD00010001396 A discovery cohort of 856 adult survivors of pediatric ALL Genome-Wide Human SNP Array 6.0 - Thermo Fisher Scientific 856
EGAD00010000925 Subset 1 of osteoarthritis cases from the arcOGEN Consortium (http://www.arcogen.org.uk/) genotyped on HumanCoreExome-12v1-0 with broader consent. Illumina HumanCoreExome-12v1-0 855
EGAD00010000959 Healthy volunteers recruited in Fiji HumanCore-24 BeadChip 854
EGAD00010000596 PCGP Ph-likeALL GEA 837
EGAD00010000456 Leukemia samples using 450K DNA methylation 800
EGAD00001003118 Targeted capture sequencing for cases with MDS who were subjected to unrelated bone marrow transplantation via Japan marrow donor program 797
EGAD00001000336 UK10K_OBESITY_SCOOP REL-2012-11-27 Illumina HiSeq 2000; 784 vcf
EGAD00001001038 We mapped the data to the UCSC human reference genome build 37 using BWA 0.5.9-r16. We first mapped each read pair separately using bwa aln. Then we used bwa sampe to map the paired reads together to a BAM9 file. The BAM file was then sorted by genomic position and indexed using PicardTools-1.32 SortSam. To prevent PCR artifacts from influencing the downstream analysis of our data, we used Picard to mark the duplicate reads, which were ignored in downstream analysis. We used GATK IndelRealigner on our data around known indels (from 1KG Pilot). The IndelRealigner creates all possible read alignments using the source and computes the likelihood of the data containing the indel based on the read pileup. Whenever the maximum likelihood contains an indel, the reads are realigned accordingly. Each base is associated with a phred-scaled base quality score. Calibration of Phred scores is crucial as they are used in some of the downstream analysis models. We used GATK to recalibrate the base qualities with respect to (i) the base cycle, (ii) original quality score, and (iii) dinucleotide context. To minimize issues stemming from mapping problems around indels, we decided to undergo a second round of indel realignment using the GATK IndelRealigner by family rather than by individual. For this second round, we considered two sources of possible indels: 1KG Phase 1 indels and indels aligned by BWA in the GoNL data. 769 bam
EGAD00001002261 These files contain indels and structural variants on 769 GoNL samples (SV release 6, 2016-05-25). Illumina HiSeq 2000; 769 vcf
EGAD00001002656 Whole exome sequencing BAM files and whole genome sequencing CRAM files for 722 individuals from the NIHR-BioResource Rare Diseases Consortium (SPEED project) with inherited retinal disease. Illumina HiSeq 2000;ILLUMINA 767
EGAD00001000743 These files contain a total of 20.4M SNVs and the complete information output by the GATK UnifiedGenotyper v1.4 on all 767 GoNL samples. These calls are not trio-aware and all genotypes were reported regardless of their quality. Both filtered and passing calls are reported in these files. Filtered calls include (1) calls failing our VQSR threshold and (2) calls in the GoNL inaccessible genome. 767 vcf
EGAD00001000821 Raw sequencing data for all samples in fastq format. Illumina HiSeq 2000; 767 fastq
EGAD00001001086 These analysis are the BAM files for the LCLs samples of the EUROBATS samples. 765 bai,bam
EGAD00001000283 Agilent whole exome hybridisation capture was performed on genomic DNA derived from MDS and matched normal DNA from the same patients. Next Generation sequencing performed on the resulting exome libraries and mapped to build 37 of the human reference genome to facilitate the identification of novel cancer genes. Now we aim to discover the prevalence of our findings using bespoke pulldown methods and sequencing the products from a larger set of patient DNA. Illumina HiSeq 2000; 764 bam
EGAD00000000038 NcOEDG Stockholm 3 samples Illumina HumanHap 550 761
EGAD00010000448 Macrophage Gene Expression Illumina Human-Ref-8 v3 beadchip 758
EGAD00010000450 Genome Wide Genotype Data Illumina Human Custom 1,2M and Human 610 Quad Custom arrays 758
EGAD00010000446 Monocyte Gene Expression Illumina Human-Ref-8 v3 beadchip 758
EGAD00001003842 WGS sequencing for 367 tumor normal pairs from ICGC ESAD-UK project Tumors 50x Normals 30x HiSeq X bam files These samples are all available in ICGC release 27 Illumina HiSeq 2000;ILLUMINA 746
EGAD00001000195 For information about this sample set, please contact the sample custodian Nic Timpson: N.J.Timpson@bristol.ac.uk Illumina HiSeq 2000; 740
EGAD00001001250 Low coverage (4-6x) sequencing on samples from population cohorts (Finrisk, Health2000) will be done at Wellcome Trust Sanger Institute (WTSI) using Illumina HiSeq sequencing technology. We will produce 100bp paired end reads. Variants will be called using the 1000 Genomes Project pipeline. The samples have been selected from a national representative set of 8028 samples from persons of 30 years or older, which were screened for psychotic and bipolar disorders using the Composite International Diagnostic Interview, self-reported diagnoses, medical examination, and national registers. Illumina HiSeq 2000; 731 bam
EGAD00010000704 610k genotyping imputed on Hapmap 3 and 1000G Phase 1 CEU 714
EGAD00001001354 Whole exome sequencing of around 700 inflammatory bowel disease cases.This data can only be used for the identification of IBD/immune-mediated disease loci. Illumina HiSeq 2000; 702 cram
EGAD00001001332 Development of a method for separation and parallel sequencing of the genomes and transcriptomes of single cells. Illumina MiSeq;, HiSeq X Ten;, Illumina HiSeq 2500; 700 bam,cram
EGAD00010000756 French glioma control germline genotypes using Illumina HumanExome-12v1_A array Illumina HumanExome-12v1_A 699
EGAD00000000035 NcOEDG Helsinki 4 samples Illumina CNV370 693
EGAD00001001089 RNAseq BAM files for the Fat samples of the EUROBATS project 685 bai,bam
EGAD00010001422 1000G Phase 3 Imputed cases and controls from NSAID-induced PUD study Illumina Omni 2.5 676
EGAD00001000725 This dataset contains RNA sequencing data for 675 cancer cell lines. RNA libraries were made with the TruSeq RNA Sample Preparation kit (Illumina) according to the manufacturer protocol. The libraries were sequenced on an Illumnia HiSeq 2000 Illumina HiSeq 2000; 675
EGAD00001000241 EGAD00001000241_UK10K_OBESITY_SCOOP_REL_2012_07_05 Illumina HiSeq 2000; 674 vcf
EGAD00010000740 Osteoarthritis cases genotyped on Illumina HumanOmniExpress from the arcOGEN Consortium (http://www.arcogen.org.uk/) with broader consent. 674
EGAD00001001087 RNAseq BAM files for the Skin samples of the EUROBATS project. 672 bai,bam
EGAD00010000951 SNP array data for 668 cancer cell lines Illumina 2.5M 668
EGAD00001002251 Exome sequencing of families with Congenital Heart Defects of diverse sub-phenotypes. Comprises both parent-offspring trios for sporadic cases and multiplex families. Collaboration with David Brook, University of Nottingham. Funded by the British Heart Foundation. Illumina HiSeq 2000; 646 cram
EGAD00001000142 Renal Follow Up Series Illumina HiSeq 2000, Illumina HiSeq 2000; 637 bam
EGAD00001003703 The incidence of acute myeloid leukemia (AML) increases with age and mortality exceeds 90% when diagnosed after age 60. Only 10-15% of cases evolve from a pre-existing myeloproliferative or myelodysplastic disorder; the remaining cases arise de novo without a detectable prodrome and are diagnosed upon development of bone marrow failure. Analysis of diagnostic blood samples has demonstrated that de novo AML is preceded by the accumulation of somatic mutations in pre-leukemic hematopoietic stem and progenitor cells (preL-HSPCs) that subsequently undergo clonal expansion. If individuals in this pre-leukemic phase could be identified, methods for determination of risk and monitoring for progression to overt AML could be developed. However recurrent AML mutations also accumulate during aging in healthy individuals who never develop AML, referred to as age related clonal hematopoiesis (ARCH). To distinguish individuals with preL-HSPCs at high risk of developing AML from those with ARCH, we undertook deep targeted sequencing of genes recurrently mutated in AML in blood samples from 133 individuals in the European Prospective Investigation into Cancer and Nutrition (EPIC) study taken on average 6 years before they developed AML (pre-AML group), together with 683 matched healthy individuals (Control group). Pre-AML cases displayed accelerated age-correlated accumulation of somatic mutations.The identity, number and variant allele frequency (VAF) of mutations differed between the two groups, and were incorporated into a computational model of AML risk prediction that accurately distinguished pre-AML cases from controls on average 7 years prior to AML development. Our findings provide proof of concept that early prediction of AML development is feasible in high-risk populations, paving the way for early disease detection, monitoring, and potentially prevention. Illumina HiSeq 2000;ILLUMINA, Illumina HiSeq 2500;ILLUMINA 628
EGAD00001003102 We sequenced the polyA+ fraction of the RNA of the leukocytes from 624 sardinian individuals with RNAseq. Prior to library preparation we added either ERCC ExFold RNA Spike-In. An average of 60M reads per samples with 51 bp paired-end reads were generated on a HiSeq 2000 (Illumina). Sequencing reads were then aligned using STAR-2.2.0c2 to the h37d5 reference genome supplemented with the ERCC spike-ins sequences. We further provided an exon-exon junction database that we generated from the GENCODE v14 annotation. In order to remove a contamination from a parallel experiment, we discarded any reads that mapped to the genomic regions of CBLB (chr3:105370773-105592330) and BCL11A (chr2:60672555-60784156). Filtered aligned reads (bam format) are shared. Illumina HiSeq 2000 (ILLUMINA) 624
EGAD00001003580 WGS sequencing for 310 tumor normal pairs from ICGC ESAD-UK project Tumors 50x Normals 30x HiSeq X bam files These samples are all available in ICGC release 26 Illumina HiSeq 2000 (ILLUMINA) 620
EGAD00001003761 This dataset contains fastq files with Whole genome sequencing data for the CPC-Gene Project. Data from each sample was generated using multiple whole genome libraries and sequenced across multiple runs Illumina HiSeq 2000;ILLUMINA, Illumina HiSeq 2500;ILLUMINA, unspecified;ILLUMINA 617
EGAD00001003706 This dataset contains fastq files with Whole genome sequencing data for the CPC-Gene Project. Data from each sample was generated using multiple whole genome libraries and sequenced across multiple runs Illumina HiSeq 2000;ILLUMINA, Illumina HiSeq 2500;ILLUMINA, unspecified;ILLUMINA 616
EGAD00010001143 HipSci - Healthy Normals - Expression Array - September 2016 Illumina 613
EGAD00010001147 HipSci - Healthy Normals - Genotyping Array - September 2016 Illumina 613
EGAD00001000251 De novo mutations in schizophrenia Illumina HiSeq 2000; 611 bam
EGAD00001001337 We propose to definitively characterise the somatic genetics of breast cancer through generation of comprehensive catalogues of somatic mutations in breast cancer cases by high coverage genome sequencing coupled with integrated transcriptomic and methylation analyses. 607
EGAD00010000952 Where Are You From? samples types at 517K SNP loci Illumina HumanOmniExpress-24 BeadChip 598
EGAD00010000754 UK glioma case germline genotypes using Illumina HumanExome-12v1_A array Illumina HumanExome-12v1_A 596
EGAD00001000256 UK10K_NEURO_UKSCZ REL-2012-07-05 Illumina HiSeq 2000; 595 vcf
EGAD00010000773 HipSci - Healthy Normals - Genotyping Array - November 2014 Illumina 580
EGAD00010000775 HipSci - Healthy Normals - Expression Array - November 2014 Illumina 580
EGAD00001000193 UK10K_OBESITY_SCOOP REL-2012-02-22 Illumina HiSeq 2000; 573 vcf
EGAD00001003334 Targeted exome sequencing of patient derived xenografts from primary colorectal tumours and liver metastases. This dataset contains all the data available for this study on 2017-05-11. Illumina HiSeq 2000;ILLUMINA 573
EGAD00001000425 GENCORD2 RNA-seq BAM files using BWA Illumina Genome Analyzer II;, Illumina HiSeq 2000; 568 bam
EGAD00010000270 Metabric breast cancer samples (Images) Aperio image - H&E stained tissue_section 564
EGAD00010001202 Human genotyping data for patients infected by hepatitis C virus Affymetrix UKBiobank Array 563
EGAD00001000430 UK10K_NEURO_UKSCZ REL-2013-04-20 Illumina HiSeq 2000; 554 vcf
EGAD00001003601 The dataset for Direct Detection of Early-Stage Cancers using Circulating Tumor DNA includes 602 bam files from next-generation sequencing on the Illumina HiSeq2500 or MiSeq. The samples analyzed include cancer cell lines as well as plasma and tissue specimens from healthy individuals and patients with cancer. Illumina MiSeq;ILLUMINA, Illumina HiSeq 2500;ILLUMINA 550
EGAD00010000268 Metabric breast cancer samples (Expression raw data) Illumina HT 12 543
EGAD00010000266 Metabric breast cancer samples (Genotype raw data) Affymetrix SNP 6.0 543
EGAD00010000381 MRCE sample using 300K Illumina 300K - GenomeStudio 543
EGAD00001001881 RIKEN collection of WGS reads for 543 liver cancer and matched blood or liver samples from 260 donors. Illumina HiSeq 2000;, Illumina Genome Analyzer IIx; 543 fastq
EGAD00001001395 Background: Invasive lobular breast cancer (ILBC) is the second most common histological subtype after ductal breast cancer (IDBC). In spite of significant clinical and pathological differences, ILBC is still treated as IDBC. Here, we aimed at identifying recurrent genomic alterations in ILBC with potential clinical implications. Methods: Starting from 630 ILBC primary tumors with a median follow up of 10 years, we interrogated oncogenic substitutions and indels of 360 cancer genes and genome-wide copy number alterations in 413 and 170 ILBC samples, respectively, and correlated those findings with clinical, pathological, and outcome features. The Cancer Genome Atlas database was used for comparison of frequency estimates. Results: Besides the high mutation frequency of CDH1 in 65% of the tumors, alterations in one of the three key genes of the PI3K pathway, PIK3CA, PTEN and AKT1, were present in more than half of the cases. ERBB2 and ERBB3 were mutated in 5.1 and 3.6% of the tumors. FOXA1 mutations and ESR1 copy number gains were detected in 9% and 25% of the samples. All these alterations were more frequent in ILBC than IDBC. The histological diversity of ILBC was associated with specific genomic alterations, such as enrichment for ERBB2 mutations in the mixed, non-classic subtype, and for ARID1A mutations and ESR1 gains in the solid subtype. Finally, ERBB2 and AKT1 mutations were associated with short-term risk of relapse, and chromosome 1q and 11p gain with increased and decreased breast cancer free survival, respectively. Conclusion: ERBB2, ERBB3 and AKT1 mutations represent high prevalence therapeutic targets in ILBC. FOXA1 mutations and ESR1 gains urgently deserve dedicated clinical investigation, especially in the context of endocrine treatment. Illumina HiSeq 2000; 541 bam,cram
EGAD00001002200 Whole exome sequencing of families with Congenital Heart Defects (182 trios). Collaboration with David Brook, University of Nottingham. Illumina HiSeq 2000; 541 cram
EGAD00010000961 Rheumatic heart disease cases recruited in Fiji HumanCore-24 BeadChip 535
EGAD00001001642 RIKEN collection of WGS reads of 530 liver cancer and matched blood samples from 260 donors. Illumina HiSeq 2000; 530 fastq
EGAD00001000335 UK10K_NEURO_UKSCZ REL-2012-11-27 Illumina HiSeq 2000; 527 vcf
EGAD00010000850 BLUEPRINT DNA methylation profiles of monocytes, neutrophils and T cells from healthy donors Illumina 450K 525
EGAD00001002155 ICGC PCAWG Dataset for WGS BAM aligned using BWA MEM. Project: LIRI-JP. 524 readme_file,bai,bam
EGAD00001003321 Systematic next generation sequencing efforts are beginning to define the genomic landscape across a range of primary tumours, but we know very little of the mutational evolution that contributes to disease progression. We therefore propose to obtain a comprehensive description of genomic, transcriptomic and epigenomic changes in a cohort of matched primary and metastatic colorectal cancers, and additionally to explore the extent to which those mutations identified as recurrent in the metastatic setting are able to subvert normal biological processes using both genetically engineered mouse models and established cancer cell lines. This study will enable us to define to what extent primary tumour profiling can capture the biological processes operative in matched metastases as well as the significance of intratumoural heterogeneity. This dataset contains all the data available for this study on 2017-05-04. Illumina HiSeq 2000;ILLUMINA 523
EGAD00001003292 Illumina HiSeq 2000;ILLUMINA 520
EGAD00000000026 Randomly-selected, unrelated individuals Illumina 610-Quad 518
EGAD00001001595 ICGC PACA-CA Release 20 Illumina HiSeq 2000;, Illumina HiSeq 2500; 516 bam,fastq
EGAD00001001956 ICGC Release 21 for PACA-CA from OICR Illumina HiSeq 2000;, Illumina HiSeq 2500; 516 fastq,bam
EGAD00001003583 516 DNA samples were collected from individuals upon enrollment into the European Prospective Investigation into Cancer and Nutrition study between 1993 and 1998 across 17 different centers. 126bp pair-end reads sequencing data from the Illumina platform were converted to fastq format, the 2bp molecular barcode information at each read of the pair was trimmed and was written in the reads name. The Thymine nucleotide required for ligation was removed from the sequences. Burroughs-Wheeler Aligner (BWA-mem) was used for alignment of the processed fastq files to the reference hg19 genome, following indel-re-alignment using GATK. An in-house algorithm was written to collapse read families that share the same molecular barcode sequence 516
EGAD00000000037 NcOEDG Stockholm 2 samples Affymetrix 5.0 514
EGAD00001001095 Supporting data for ICGC PACA-CA Release 18 Illumina HiSeq 2000;, Illumina HiSeq 2500; 506 bam,fastq
EGAD00010001410 Genotyped samples using Illumina Infinium HumanCoreExome Beadchip Illumina Infinium HumanCoreExome Beadchip 502
EGAD00001000744 The samples in this panel come from 250 families: 248 parents-child trios and 2 parent-child duos. As the children do not provide additional haplotypes or population information, they were excluded from the panel. The samples present in the release are composed of 248 couples, 2 single individuals and 1 sample composed from the 2 haplotypes from the duo's children transmitted by their missing parent. The composed sample is named gonl-220c_223c. The files contain a total of 18.9M SNVs and 1.1M INDELs in autosomal chromosomes. They were generated by phasing/imputing the SNVs (a) and INDELs (b) using MVNCall. Only sites passing filters are reported. Sites filtered as part of the GoNL inaccessible genome were kept (but flagged as filtered) and still may contain true positive calls but should be used with care as they are located in parts of the genome that are less well captured (systematic under or over-covered or low-mapping quality) 499 vcf
EGAD00001002127 ICGC PCAWG Dataset for WGS BAM aligned using BWA MEM. Project: PBCA-DE. 496 readme_file,bai,bam
EGAD00010000922 Subset 1 of osteoarthritis cases from the arcOGEN Consortium (http://www.arcogen.org.uk/) genotyped on HumanCoreExome-24v1-0 with broader consent. Illumina HumanCoreExome-24v1-0 494
EGAD00001001983 Immunoglobulin heavy chain gene high throughput sequencing of paediatric acute lymphoblastic leukaemia samples, for the purpose of MRD on the Illumina MiSeq platform. This dataset contains summary fastq files and raw bcl files from the MiSeq for this study. In the study we identify errors associated with multiplexing that could potentially impact on the accuracy of MRD analysis. We optimise a strategy combining high purity, sequence-optimised oligonucleotides, dual-indexing and an error-aware demultiplexing approach to minimise errors and maximise sensitivity. Illumina MiSeq; 491 fastq
EGAD00001001946 The Prenatal Assessment of Genomes and Exomes (PAGE) study is a multicentre prospective trial, performing exome sequence analysis on samples from 1000 families with structural anomalies in prenatal ultrasound screening but normal aneuploidy results. The data will enable discovery of novel genetic disorders and increase the diagnostic yield. Where appropriate, results will be reported back to the families at the end of the pregnancy, after thorough clinical review. Ultimately, the translation of the acquired know-how into cost-effective prenatal diagnostic sequencing will improve genetics-derived prognoses and allow more informed parental counselling as well as management of pregnancy and childbirth. Illumina HiSeq 2000; 489 cram
EGAD00000000036 NcOEDG Stockholm 1 samples Affymetrix 500K 484
EGAD00001002898 Oliocapture sequencing libraries from the study "Histological Transformation and Progression in Follicular Lymphoma: a Clonal Evolution Study". These are sequencing libraries from the extension cohort of 277 patients. Specifically, there are 402 tumor libraries and 82 normal libraries. 484
EGAD00001003127 WGS data of medulloblastoma tumor/control pairs. 482
EGAD00001000412 We are sequencing the exomes of patients with paroxysmal neurological disorders mainly focusing on migraine and epilepsy. Cases are collected from performance sites of members of the International Headache Genetics consortium and EuroEPINOMICS. Most cases have a strong family history. The study sample will include both cases and controls. Illumina HiSeq 2000; 477 bam,cram
EGAD00001001941 Variants derived from mapped whole transcriptome RNA-Seq data from 476 human samples of early stage urothelial carcinoma. 476 vcf
EGAD00001001939 Mapped whole transcriptome RNA-Seq data from 476 human samples of early stage urothelial carcinoma. Illumina HiSeq 2000; 476 bam
EGAD00001001940 Un-mapped whole transcriptome RNA-Seq data from 476 human samples of early stage urothelial carcinoma. Illumina HiSeq 2000; 476 bam
EGAD00010001400 Difference in gene expression values between case and control, log2 values. Blood transcriptome from women participating in the Norwegian Women and Cancer study (NOWAC) Post-genome Cohort taken up to eight years before brest cancer diagnosis. Illumina HumanWG-6 version 3 or Illumina HumanHT-12 expression bead chip, combined on identical nucleotide universal identifiers. Illumina HumanWG-6 467
EGAD00010000957 Rheumatic heart disease cases recruited in New Caledonia HumanCore-24 BeadChip 465
EGAD00010000923 Subset 2 of osteoarthritis cases from the arcOGEN Consortium (http://www.arcogen.org.uk/) genotyped on HumanCoreExome-12v1-0 with consent for osteoarthritis studies only. Illumina HumanCoreExome-12v1-0 463
EGAD00001001880 RIKEN collection of RNA-seq reads for 458 liver cancer samples and matched normal liver from 247 donors. Illumina HiSeq 2000;, Illumina Genome Analyzer IIx; 458 fastq
EGAD00010000916 BASIS breast cancer DNA methylation Illumina 450k Illumina 450k 457
EGAD00010001062 blood-based gene expression from breast cancer cases and age-matched controls IlluminaHuman AWG-6 and HT12 455
EGAD00001000825 This study aims to define the landscape of somatic mutations in sun exposed human skin by deep sequencing, analyse their frequency and use the data to infer the effect of mutations on proliferating cell behaviour. The frequency of each mutation will reflect the size of the clone of cells in the tissue sample. By analyzing small samples, clones with as few as 100 cells will be detectable. Allele frequency distributions for each mutation will be used to infer cell fate using published methods (Klein et al. 2010). This study will shed unprecedented light on the early clonal events that lead to the emergence of cancer. Illumina HiSeq 2000; 454 cram
EGAD00001001026 The offspring of first cousin marriages have ~6% of their genome autozygous, i.e. homozygous identical by descent, or even more if there was further consanguinity in their ancestry. In the UK there are large populations with very high first cousin marriage rates of 20-50%. Sequencing the exomes of a sample of these individuals has the potential both to support genetic health programmes in these populations, and to provide genetic research information about rare loss of function mutations. This pilot study based on existing British-Pakistani cohort samples from Birmingham will identify homozygous individuals for almost all variants down to an allele frequency around 1%, plus individuals carrying hundreds of new homozygous rare loss-of-function variants, and will support development of community relations and ethics for a wider study currently being designed. The data deposited in the EGA consists of low coverage whole exome sequencing on these samples. Illumina HiSeq 2000; 452 cram
EGAD00001001426 Systematic next generation sequencing efforts are beginning to define the genomic landscape across a range of primary tumours, but we know very little of the mutational evolution that contributes to disease progression. We therefore propose to obtain a comprehensive description of genomic, transcriptomic and epigenomic changes in a cohort of matched primary and metastatic colorectal cancers, and additionally to explore the extent to which those mutations identified as recurrent in the metastatic setting are able to subvert normal biological processes using both genetically engineered mouse models and established cancer cell lines. This study will enable us to define to what extent primary tumour profiling can capture the biological processes operative in matched metastases as well as the significance of intratumoural heterogeneity. This dataset contains all the data available for this study on 2015-07-02. Illumina HiSeq 2000; 446 cram
EGAD00001002651 Presurgical studies allow study of the relationship between mutations and response of estrogen receptor positive (ER+) breast cancer to aromatase inhibitors (AIs) but have been limited to small biopsies. Here in Phase I of this study, we perform exome sequencing on baseline, surgical core-cuts and blood from 60 patients (40 AI treated, 20 Controls). In poor responders (based on Ki67 change) we find significantly more somatic mutations than good responders. Subclones exclusive to baseline or surgical cores   occur in approximately 30% of tumours. In Phase II we combine targeted sequencing on another 28 treated patients with Phase I. We find six genes frequently mutated: PIK3CA, TP53, CDH1, MLL3, ABCA13 and FLG with 71% concordance between paired cores. TP53 mutations are associated with poor response. We conclude that multiple biopsies are essential for confident mutational profiling of ER+ breast cancer and TP53 mutations are associated with resistance to oestrogen deprivation therapy. Illumina HiSeq 2000;ILLUMINA, Illumina HiSeq 2000; 443 bam
EGAD00010000714 aplastic anemia samples tumor using 250K Affymetrix 250K Nsp-GTYPE 440
EGAD00010000630 The TEENAGE study target population comprised adolescent students aged 13–15 years attending the first three classes of public secondary schools located in the wider Athens area of Attica. 436
EGAD00001002747 Whole-exome sequencing (WES) of 216 breast cancer metastasis-normal pairs from patients who underwent a biopsy in the context of the SAFIR01, SAFIR02, SHIVA or MOSCATO prospective trials (France). Illumina HiSeq 2500;ILLUMINA, NextSeq 500;ILLUMINA, Illumina HiSeq 4000;ILLUMINA 432
EGAD00001003360 Bam files containing mitochondrial alignments, extracted from CPCGene Whole Genome Alignments 432
EGAD00001003361 VCF files containing mitochondrial variant calls using MToolbox 432
EGAD00001000300 UK10K_OBESITY_GS_REL_2012_07_05 Illumina HiSeq 2000; 430 bam
EGAD00001000431 UK10K_OBESITY_GS REL-2013-04-20 Illumina HiSeq 2000; 428 vcf
EGAD00001003206 BACKGROUND TRACERx (TRAcking Cancer Evolution through therapy (Rx)) is a prospective cohort study designed to investigate intratumor heterogeneity (ITH) in relation to clinical outcome, and to determine the clonal nature of driver events and evolutionary processes in early stage non-small cell lung cancer (NSCLC). METHODS Multiregion high-depth whole-exome sequencing (M-seq) was performed on 100 early stage NSCLC tumors resected prior to systemic therapy. A total of 327 tumor regions were sequenced and analyzed to define evolutionary histories, obtain a census of clonal and subclonal events, and assess the relationship between ITH and recurrence-free survival (RFS). RESULTS Widespread ITH was observed for both somatic copy number alterations (median 48% [0.03-88%]) and mutations (median 30% [0.5-93%]). Driver mutations in EGFR, MET, BRAF and TP53 were almost always clonal. However, heterogeneous driver alterations occurring later in evolution were found in over 75% of tumors and were common in PIK3CA, NF1 and genes involved in chromatin modification and DNA response and repair. Genome doubling and ongoing dynamic chromosomal instability (CIN), illustrated by mirrored subclonal allelic imbalance, were identified as causes of ITH resulting in parallel evolution of driver copy number events, including amplifications of CDK4, FOXA1, and BCL11A. Elevated copy number heterogeneity was associated with shorter RFS (HR=4.9, P=0.00044), which remained significant in a multivariate analysis. CONCLUSIONS ITH mediated through CIN, rather than point mutational heterogeneity, was associated with increased risk of relapse, supporting its value as a prognostic predictor, and the need to target this high-risk phenotype. 427
EGAD00001000309 UK10K_OBESITY_GS REL-2012-11-27 Illumina HiSeq 2000; 424 vcf
EGAD00001001215 Targeted sequencing follow-up of genomic lesions in multiple myeloma. Illumina HiSeq 2000; 424 cram
EGAD00001003787 BBMRI - BIOS project - Freeze 2 - Fastq files - GoNL samples Illumina HiSeq 2000;ILLUMINA 420
EGAD00001003786 BBMRI - BIOS project - Freeze 2 - Bam files - GoNL samples Illumina HiSeq 2000;ILLUMINA 420
EGAD00001001096 Illumina HiSeq 2000; 419 bam
EGAD00001003330 The samples will be sequenced for a targeted panel of cancer relevant genes (n ~ 370) and analysed for somatic mutations. This dataset contains all the data available for this study on 2017-05-11. Illumina HiSeq 2000;ILLUMINA 416
EGAD00001000800 This project aims to study exomes from families and trios with congenital heart disease (CHD). The samples have been collected under the Competence Network - Congenital Heart Defects in Berlin, Germany. The phenotypes are mainly left ventricular outflow obstruction (aortic stenosis, bicuspd aortic valve disease coarctation and hypoplastic left heart), but will also include samples with hypoplastic right heart and atrioventricular septal defects. We will perform whole exome sequencing using Agilent sequence capture and Illumina HiSeq sequencing. Illumina HiSeq 2000; 406 bam,cram
EGAD00010000652 Genotyped samples using Illumina HumanOmni2.5 402
EGAD00001003205 160 WES and 25 WGS for HBV related HCC, and 15 WES for ICC belongs LICA-CN Illumina HiSeq 2000;ILLUMINA 402
EGAD00001000199 ORCADES_WGA Illumina HiSeq 2000; 400 bam
EGAD00001000783 Genomic libraries will be generated from total genomic DNA derived from 200+ patients with childhood Transient Myeloproliferative Disorder (TMD) and or Acute Megakaryocytic Leukemia (AMKL) as well some matched constitutional samples (n < 50 ). Libraries will be enriched for a selected panel of genes using a bespoke pulldown protocol. 96 Samples will be individually barcoded and subjected to up to two lanes of Illumina HiSeq. Paired reads will be mapped to build 37 of the human reference genome to facilitate the characterisation of known gene mutations in cancer as well as the validation of potentially novel variants identified by prior exome sequencing. Illumina HiSeq 2000; 400 cram
EGAD00010000917 399 tumors profiled using Agilent miRNA microarrays (Product Number G4872A, design ID 046064). The arrays are based on miRBase release 19.0 and 2006 human miRNAs are represented. 150 ng total RNA was used as input. Agilent miRNA microarrays 399
EGAD00001001315 Phenotype determination by SNP-Typing using PCR and snapshotPCR with subsequent fragment analysis. We investigated 400 individuals from Northern Germany and detected up to 12 different SNPs to determine eye, hair and skin colour. More than 1000 different runs on a ABI3130 were performed This dataset includes: - Phenotype information for 400 samples - Summary and complete genotype calls for 12 SNPs on 400 samples. 399 phenotype_file
EGAD00010000389 Cambridge control samples using a 24k expression array from Illumina Illumina Human-Ref 8 v3.0 expression array 395
EGAD00001000403 The ENGAGE project is a FP7 funded EU project aiming to combine genetic and phenotype information from European population based cohorts. In this sub-project we aim to do whole exome sequencing of individuals selected from Health 2000 and FINRISK cohorts. Individuals have been selected based on their metabolic trait phenotypes Illumina HiSeq 2000; 394 bam
EGAD00010000383 MRCA sample using 100K Illumina 100K - GenomeStudio 394
EGAD00010000385 MRCA sample using 300K Illumina 300K - GenomeStudio 394
EGAD00001001029 The dataset regards the sequencing of coding and putative regulatory sequences of 38 genes associated to either sporadic or Mendelian form of Parkinson's disease Illumina HiSeq 2000; 394 bam
EGAD00001002685 Breast cancer PDTX sequencing data from Bruna et al, Cell 2016 - Exome Sequencing - Shallow Whole Genome Sequencing - RRBS Methylation Sequencing Illumina HiSeq 2500;ILLUMINA, Illumina HiSeq 2500; 393 bam
EGAD00001000433 UK10K_NEURO_ABERDEEN REL-2013-04-20 Illumina HiSeq 2000; 392 vcf
EGAD00001001088 RNAseq BAM files for the blood samples of the EUROBATS project 391 bai,bam
EGAD00010001075 Argentine samples using 250K Illumina Exome 250K 391
EGAD00001003594 This data is part of a pre-publication release. For information on the proper use of pre-publication data shared by the Wellcome Trust Sanger Institute (including details of any publication moratoria), please see http://www.sanger.ac.uk/datasharing/ This dataset contains all the data available for this study on 2017-08-29. Illumina HiSeq 2500;ILLUMINA, Illumina HiSeq 4000;ILLUMINA 391
EGAD00001002698 Recurrent breast cancer is almost universally fatal. We characterize 170 patients locally relapsed or distant metastatic cancers using massively parallel sequencing. We identify that the relapse-seeding clone disseminates late from the primary tumor. TP53 and AKT1 appear to be enriched in ER-positive cancers predisposed to relapse. Mutation acquisition continues at relapse as the same mutation signatures continue to operate and new signatures, such as that caused by radiotherapy appear de novo. In 49% of cases we identify drivers mutations private to the relapse and these are sampled from a wider range of cancer genes, including SWI-SNF complex and JAK-STAT signaling. Illumina MiSeq;ILLUMINA, Illumina HiSeq 2000;ILLUMINA, Illumina HiSeq 2500;ILLUMINA 387 cram
EGAD00001003748 Sequencing of B-cell receptor repertoires in healthy individuals and patients with chronic lymphocytic leukemia. 1) This data is part of a pre-publication release. For information on the proper use of pre-publication data shared by the Wellcome Trust Sanger Institute please see http://www.sanger.ac.uk/datasharing/ This dataset contains all the data available for this study on 2017-09-13. Illumina MiSeq;ILLUMINA 387
EGAD00001003150 Microfluidic direct library preparation (DLP) single-cell whole-genome BAM files for fourth-passage patient-derived primary triple-negative breast cancer xenograft SA501X4F. Illumina HiSeq 2500;ILLUMINA 384
EGAD00001003149 Microfluidic direct library preparation (DLP) single-cell whole-genome BAM files for third-passage patient-derived primary triple-negative breast cancer xenograft SA501X3F. Illumina HiSeq 2500;ILLUMINA 384
EGAD00001001857 Illumina HiSeq 2000; 381 fastq
EGAD00010001326 Papuan Genotyping Illumina Multi-EthnicGlobal_A1 380
EGAD00001001994 CCA targeted sequencing Illumina HiSeq 2500; 376
EGAD00001001018 The samples will be sequenced for a targeted panel of cancer relevant genes (n ~ 370) and analysed for somatic mutations. This dataset contains all the data available for this study on 2014-09-24 Illumina HiSeq 2000; 374 cram
EGAD00010000692 Genome-wide DNA methylation epigenotyping of African rainforest hunter-gatherers and neighbouring agriculturalists by Illumina HumanMethylation450 372
EGAD00001003388 Aligned, merged and deduplicated BAM files from HiSeq whole genome sequencing of 366 samples: matched tumour-normal pairs from 183 melanoma patients. 366
EGAD00001000812 Sequencing of 350 cancer genes in BC samples from patients treated with either Epirubicin or Paclitaxel monotherapy in the neoadjuvant setting. Illumina HiSeq 2000; 364 cram
EGAD00001000619 Experiments using targeted pulldown methods will be sequenced to validate findings in the exomes of patients with Myeloproliferative Neoplasms (MPN). Illumina HiSeq 2000; 360 bam
EGAD00010001310 iOmics lipid data via mass spectrometry (MS) Agilent 1200 LC system 359
EGAD00010000954 Healthy volunteers recruited in New Caledonia HumanCore-24 BeadChip 356
EGAD00001002249 Single-Cell RNA Sequencing of 355 cells isolated from 7 tissue fragments of 3 patients corresponding to locally adjacent tumor, multifocal with recurrence and sections segregated by a marker of tumor cellularity (5-ALA). Illumina HiSeq 2500; 355
EGAD00001000660 Analysis .bam files from HiSeq sequencing of Australian ICGC PDAC study samples, submitted 20130826 353 bam
EGAD00001000080 Genomics of Colorectal Cancer Metastases - Massively Parallel Sequencing of Matched Primary and Metastatic tumours to Identify a Metastatic Signature of Somatic Mutations (MOSAIC) Illumina HiSeq 2000, Illumina HiSeq 2000; 351 bam,cram
EGAD00010001308 iOmics miRNA data via qPCR quantification patented mSMRT-qPCR miRNA assay (MIRXES) 351
EGAD00001000227 EGAD00001000227_UK10K_NEURO_ABERDEEN_REL_2012_07_05 Illumina HiSeq 2000; 347 vcf
EGAD00001003140 We analyzed the spectrum and clinical significance of MYC and BCL2 mutations in 347 DLBCL cases from population-based cohort of BC, Canada. Illumina MiSeq;ILLUMINA 347
EGAD00001002192 Additional sequencing data for 173 donors in EGAS00001000154, a study of Pancreatic Ductal Adenocarcinoma. WGS libraries were used for high-cellularity cases, WXS sequencing to high depth on low-cellularity cases. HiSeq 2xxx platform was used in all cases. The analysis files associated with this dataset are merged, de-duplicated bams aligned against GRCh37, one tumour and one normal bam per donor. 346 bam
EGAD00010001319 Medulloblastoma methylation profiling Illumina Infinium HumanMethylation450 BeadChip 345
EGAD00010000915 Affymetrix SNP6.0 breast cancer genome sequencing data Affymetrix SNP6.0 344
EGAD00001002705 McGill EMC Release 6 data Illumina HiSeq 2500;ILLUMINA 343 fastq
EGAD00001000614 UK10K_NEURO_ASD_SKUSE REL-2013-04-20 Illumina HiSeq 2000; 341 vcf
EGAD00001000678 FFPE CPA accreditation of genome-scale sequencing in routinely collected formalin-fixed paraffin-embedded (FFPE) cancer specimens versus matched fresh-frozen samples using targeted pulldown capture prior to Illumina sequencing. Illumina HiSeq 2000; 341 bam,cram
EGAD00010000664 Finnish population cohort genotyping_B 340
EGAD00001001126 340 other
EGAD00010000464 Down syndrome SNP genotyping data Illumina 550K - Illumina Genome Studio 338
EGAD00001002668 Metagenomic shotgun sequencing of Irritable bowel syndrome patients and matched controls Illumina HiSeq 2000; 336
EGAD00001000879 Genomic libraries will be generated from total genomic DNA derived from 200+ patients with childhood Transient Myeloproliferative Disorder (TMD) and or Acute Megakaryocytic Leukemia (AMKL) as well some matched constitutional samples (n < 50). Libraries will be enriched for a selected panel of genes using a bespoke pulldown protocol. 96 Samples will be individually barcoded and subjected to up to two lanes of Illumina HiSeq. Paired reads will be mapped to build 37 of the human reference genome to facilitate the characterisation of known gene mutations in cancer as well as the validation of potentially novel variants identified by prior exome sequencing. Illumina HiSeq 2500; 335 cram
EGAD00001001631 Illumina MiSeq; 334 fastq
EGAD00001001872 Targeted exome sequencing of patient derived xenografts from primary colorectal tumours and liver metastases. This dataset contains all the data available for this study on 2016-01-06. Illumina HiSeq 2000; 333 cram
EGAD00001000965 Cancers are ecosystems of genetically related clones, competing across space and time for limited resources. To understand the clonal structure of primary breast cancer, we applied genome and targeted sequencing to 295 samples from 49 patients’ tumors. The extent of subclonal diversification varied considerably among patients and encompassed many spatial patterns, including local growth, intraductal dissemination and clonal intermixture. Landmarks of disease progression, such as acquiring invasive or metastatic potential, arose within detectable subclones of antecedent lesions, suggesting that subclonal mutations could be relevant if actionable. No defined temporal order of mutation was evident, with the commonest genes, including PIK3CA, TP53, BRCA2, PTEN and MYC, mutated early in some, late in others, often exhibiting parallel evolution across subclones. Signatures of homologous recombination deficiency correlated with response to neoadjuvant chemotherapy. Thus, the interplay of mutation, growth and competition drives clonal structures of breast cancer that are complex, variable across patients and clinically relevant. Illumina HiSeq 2000; 331 bam,cram
EGAD00001000877 Complete WGS and RNA-Seq dataset for Australian ICGC ovarian cancer sequencing project 2014-07-07, representing 93 donors. Sequencing was performed on Illumina HiSeq. Alignment of the lane-level fastq data was performed with bwa (WGS data) and RSEM (transcriptome data). For this dataset lane-level .bam files have been merged and de-duplicated to create a single bam file for each sample type (tumour/normal) for each donor. This dataset supersedes all previous datasets for this study. 331 bam
EGAD00001000875 The CRO7 clinical trial recruited patients with clinically operable rectal adenocarcinoma. Patients were randomized to either pre-operative short course surgery followed by chemo-radiotherapy only in those patients at high risk of local relapse. Patients in both arms the received standard %-FU based adjuvant chemotherapy as per local policy. We intend to use FFPE derived DNA from the primary tumours to identify patterns of mutations or copy number alterations that are predictive of local or distant relapse. Illumina HiSeq 2000; 330 cram
EGAD00001001303 The dataset for the PROP1 study consists of samples of patients with combined pituitary hormone deficiency due to two most prevalent mutations in the PROP1 gene (c.301_302delGA and c.150delA) and healthy relatives and controls. All subjects were genotyped for 21 single nucleotide polymorphisms surrounding the PROP1 gene in order to assess the potential ancestral origin of the respective mutations. The genotype data are displayed in the vcf format. 328 vcf
EGAD00001000108 Paroxysmal neurological disorders Illumina HiSeq 2000, Illumina Genome Analyzer II, Illumina HiSeq 2000; 327 bam,srf
EGAD00001000407 We are sequencing the exomes of patients with paroxysmal neurological disorders mainly focusing on migraine and epilepsy. Cases are collected from performance sites of members of the International Headache Genetics consortium and EuroEPINOMICS. Most cases have a strong family history. The study sample will include both cases and controls. Illumina HiSeq 2000; 327 bam
EGAD00001002667 Additional files for "The Genomic Landscape of Core-Binding Factor Acute Myeloid Leukemias" (EGAS00001000349). This dataset includes the processed Excap data referenced in this paper. Illumina HiSeq 2000;ILLUMINA 327 bam
EGAD00001003783 Recent studies using next-generation sequencing strategies have described the landscape of genetic alterations in diffuse large B-cell lymphoma (DLBCL). However, little is known about the clinical relevance of recurrent mutations and copy number alterations and their transcriptional footprints. This study examines the frequency, interaction and clinical impact of recurrent genetic aberrations in DLBCL using high-resolution technologies in a large population-based cohort. 324
EGAD00010001309 iOmics genomic data using 2.5M and Exome array Illumina 2.5M and Illumina Exome array 323
EGAD00001000808 RIKEN collection WGS reads for 321 HCC and blood matched samples from 158 donors submitted to ICGC for release 15 Illumina HiSeq 2000;, Illumina Genome Analyzer IIx; 321 fastq
EGAD00001000231 EGAD00001000231_UK10K_NEURO_ASD_SKUSE_REL_2012_07_05 Illumina HiSeq 2000; 320 vcf
EGAD00001002678 The data set consists of low-pass whole genome sequence data of single CTCs, pools of CTCs and germline controls for a cohort of 31 SCLC patients at both baseline, and for 5 patients at relapse. In addition 9 CDX models and associated germline controls (where available) are included. Illumina MiSeq;ILLUMINA, Illumina HiSeq 2500;ILLUMINA, NextSeq 500;, NextSeq 500;ILLUMINA 319 fastq
EGAD00001003274 Whole genome sequencing data for MMML (tumor/control pairs and one cell_line) 315
EGAD00001003318 RNA-sequencing alignment for SYSCOL colorectal adenoma-carcinoma samples 314
EGAD00001000315 UK10K_NEURO_ABERDEEN REL-2012-11-27 Illumina HiSeq 2000; 313 vcf
EGAD00001001427 Targeted cancer gene sequencing of samples enrolled in the SSGXVIII trial from Finland. Illumina HiSeq 2000; 312 cram
EGAD00001000313 UK10K_NEURO_ASD_SKUSE REL-2012-11-27 Illumina HiSeq 2000; 305 vcf
EGAD00001001041 Comparison of genomic rearrangements and DNA methylation patterns between different foci of multiple synchronous (multifocal and multicentric) invasive breast cancers. Illumina Genome Analyzer II;, Illumina HiSeq 2000; 305
EGAD00000000041 GenomEUtwin Swedish (SWE) samples Illumina HumanHap 300 302
EGAD00010001025 BLUEPRINT DNA methylation profiles of monocytes, T cells and B cells in type 1 diabetes-discordant monozygotic twins Illumina 450K 302
EGAD00001000277 High Quality Variant Call files, generated by bioscope, converted to vcf format. Complete dataset for all 300 samples. 300 vcf
EGAD00001000680 Single end short-read (50 bp) SOLiD 4 sequencing data for 300 individuals, constituting 100 patient-parent trios. For more details please read; http://www.nejm.org/doi/full/10.1056/NEJMoa1206524 AB SOLiD 4 System; 300 bam
EGAD00001001466 Whole Genome sequencing. 2 ?g of genomic DNA from each sample was used for the construction of two short-insert paired-end sequencing libraries. Both types of libraries were sequenced in paired-end mode on Illumina GAIIx (2 × 151 bp) using Sequencing kit v4 or Illumina HiSeq2000 (2x101 bp) using TruSeq SBS Kit v3. 300 bam
EGAD00010000892 Healthy individuals from Italy Illumina 300
EGAD00001001674 Illumina MiSeq;, Illumina HiSeq 2500; 299 bam
EGAD00001002886 Exome sequencing of North American Brain Expression Consortium (NABEC) subject. Illumina HiSeq 2000;ILLUMINA 298 fastq
EGAD00001003162 ICGC PCAWG Dataset for WGS BAM aligned using BWA MEM. Project: PACA-CA. 298
EGAD00010000460 GENCORD2 DNA methylation 294
EGAD00001002725 Autism spectrum disorder (ASD) is a collection of neuro-developmental disorders characterized by deficits in social interaction and social communication, along with restricted and repetitive behaviour patterns. we globally interrogated the histone acetylomes of enhancers in a large cohort of ASD and control samples by analyzing tissue from three brain regions postmortem: prefrontal cortex (PFC), temporal cortex (TC) and cerebellum (CB). H3K27ac was selected as the representative acetylation mark and 288 ChIP-seq were performed on these postmortem samples. Illumina HiSeq 2000;ILLUMINA 291 bam
EGAD00001000066 Breast Cancer Follow Up Series Illumina Genome Analyzer II 288 bam
EGAD00010000144 Healthy volunteer collection of European Ancestry Illumin OmniExpress v1.0 - Illumina GenomeStudio 288
EGAD00001002180 Targeted pulldown of genes known to be recurrently mutated in AML & MDS from patient and normal samples using Agilent Sureselect and for some cases also using Illumina Truseq technology. Illumina HiSeq 2000; 288 cram
EGAD00001000439 UK10K_NEURO_FSZNK REL-2013-04-20 Illumina HiSeq 2000; 285 vcf
EGAD00001002652 50 ng of genomic double stranded DNA was enzymatically sheared to an average size of 200 bp. Further processing was performed using Illumina Nextera Rapid Capture Custom Kit (Illumina) and 100 bp paired-end sequencing was performed with 24 samples per lane on a Illumina HiSeq 2000 (Illumina) to reach a coverage of 100-1000x. 284 bam,bai
EGAD00001000234 EGAD00001000234_UK10K_NEURO_FSZNK_REL_2012_07_05 Illumina HiSeq 2000; 281 vcf
EGAD00001003291 This dataset represents RNA-sequencing data from 278 primary colon cancers obtained from fresh-frozen tumor sections. RNA-sequencing was performed using TruSeq library preparation and samples were sequenced on Illumina NextSeq and HiSeq. The data are available as Illumina NextSeq and HiSeq fastq files (_R1.fastq and _R2.fastq for each tumor sample, 556 files in total). NextSeq 500 (ILLUMINA), Illumina HiSeq 2500 (ILLUMINA) 278
EGAD00010001198 Case control samples using Infinium Omni2.5 Infinium Omni2.5M 274
EGAD00001000183 UK10K_NEURO_FSZNK REL-2012-01-13 Illumina HiSeq 2000; 273 vcf
EGAD00001003602 Dataset consisting of: (1) N=234 genome-wide chromatin accessibility (ATAC-seq) profiles for distinct N=21 healthy old and N=28 healthy young subjects. ATAC-seq biological samples provided for the following tissues: PBMC (N=24), CD14+ monocytes (N=18), CD8+ memory T cells (N=7), CD8+ naive T cells (N=7), CD4+ memory T cells (N=7), CD4+ naive T cells (N=7), and naive B cells (N=7). (2) N=39 genome-wide transcription (RNA-seq) data for distinct N=15 healthy old and N=24 healthy young subjects' PBMCs. Illumina HiSeq 2500;ILLUMINA 273
EGAD00001002241 Sequencing data for ICGC Oesophageal Adenocarcinoma tissue samples - chemo_cohort Illumina HiSeq 2000; 270
EGAD00010001307 iOmics gene expression data using Expression Array Affymetrix Human Gene 1.0 ST Array 269
EGAD00001001012 The need for a detailed catalogue of local variability for the study of rare diseases within the context of the Medical Genome Project motivated the whole exome sequencing of 267 unrelated individuals, representative of the healthy Spanish population. AB 5500xl Genetic Analyzer; 267 fastq
EGAD00001003101 The need for a detailed catalogue of local variability for the study of rare diseases within the context of the Medical Genome Project motivated the whole exome sequencing of 267 unrelated individuals, representative of the healthy Spanish population. 267 vcf
EGAD00001001238 Extension analysis to pursue candidate genes of interest in chordoma Illumina HiSeq 2000; 262 cram
EGAD00001001239 Extension analysis to pursue candidate genes of interest in chordoma Illumina HiSeq 2000; 262 cram
EGAD00001001932 HipSci - Healthy Normals - Exome Sequencing - January 2016 Illumina HiSeq 2000; 262 tabix,cram,vcf,bai,bam
EGAD00001003139 Aligned sequence data for 124 CPCGene Tumour/Normal Pairs from the 200PG Study 262
EGAD00001001290 McGill EMC Release 4 for assay "RNA-seq": Transcriptome profiling by high-throughput sequencing Illumina HiSeq 2500; 261 fastq
EGAD00010000496 Genome-wide SNP genotyping of African rainforest hunter-gatherers and neighbouring agriculturalists Illumina HumanOmni1-Quad-Illumina GenomeStudio 260
EGAD00001000332 UK10K_NEURO_FSZNK REL-2012-11-27 Illumina HiSeq 2000; 258 vcf
EGAD00001002218 Sequencing data for ICGC Oesophageal Adenocarcinoma tissue samples - 129_cohort EAC whole genomic sequencing data - Publication Secrier & Li et al., 2016, Nature Genetics Illumina HiSeq 2000; 258 bam
EGAD00001001636 Whole-genome sequencing at 4x of 250 samples from the Greek isolate collection HELIC Illumina HiSeq 2000; 250 bam
EGAD00010000578 Gencode case samples using 550K 249
EGAD00001001382 TwinsUK whole exome sequencing using NimbleGen SeqCap EZ 248 bam,bai
EGAD00010000927 Subset 2 of osteoarthritis cases from the arcOGEN Consortium (http://www.arcogen.org.uk/) genotyped on HumanCoreExome-24v1-0 with consent for osteoarthritis studies only. Illumina HumanCoreExome-24v1-0 248
EGAD00010001218 Raw Array data from the CPCGene 200PG study Affymetrix OncoScan FFPE Express 248
EGAD00001000405 In this project we will sequence the exomes of 250 patients with Parkinson's disease Illumina HiSeq 2000; 247 bam
EGAD00001002195 The aim of this project is to identify rare genetic variants of large effect implicated in complex diseases by focusing on the study of cardiovascular diseases and related quantitative traits in a well characterized isolated population in Cilento area, Italy. The reference panel has been selected carefully in order to maximize the imputation coverage and quality on the all population samples. The selected individuals should meet three criteria: selected individuals should be chip-genotyped and closely related to the maximum number of chip-genotyped individuals so as to maximize imputation coverage; relatedness between selected individuals should be minimal, so as to minimize redundancy in genetic information of the reference panel. We perform exome sequencing on samples from 250 individuals from the Campora and Gioi-Cardile populations. Illumina HiSeq 2000; 247 cram
EGAD00001001094 Raw Fastq files for 124 CPCGene Tumour/Normal Pairs from the 200PG Study Illumina HiSeq 2500;ILLUMINA, Illumina HiSeq 2500; 247 fastq
EGAD00010001301 Medulloblastoma expression profiling Affymetrix expression array 246
EGAD00001002128 ICGC PCAWG Dataset for WGS BAM aligned using BWA MEM. Project: PRAD-CA. 244 readme_file,bai,bam
EGAD00001000141 Triple Negative Breast Cancer Whole Genomes Illumina Genome Analyzer II;, Illumina HiSeq 2000; 243
EGAD00001000784 This study aims to target capture sequence regions of interest from DNA derived from breast cancer patients who received neo-adjuvant chemotherapy. All patients had multiple biopsies performed before chemotherapy. Patients who had residual disease after the course of treatment underwent a further biopsy. We aim to characterise the mutations involved. Illumina HiSeq 2000; 242 cram
EGAD00001001383 TwinsUK whole exome sequencing using NimbleGen 2.1M SeqCap 242 bam,bai
EGAD00010000480 ccRCC case samples using 250K Nsp Affymetrix_250K(Nsp) - gtype 240
EGAD00001001076 Fastq files of 239 samples of biliary tract cancer Illumina HiSeq 2000; 239 fastq
EGAD00001001457 All samples from the "100" project Illumina HiSeq 2000; 238 bam
EGAD00010001103 Genotype data from Chad, Lebanon, and Yemen Illumina HumanOmni2.5-8 v1.1 B 238
EGAD00001003204 Understanding how cells sense and respond to their environment, and how these responses are modulated by genetic variation, are fundamental biological problems, particularly for understanding how pathogenic organisms invade and manipulate the cells of the human immune system. Macrophages recognize and respond to many important human pathogens including HIV-1, Mycobacteria tuberculosis and Salmonella. This study will focus on the cellular response of human macrophages to Salmonella infection and how this response is modulated by the genetic bacground of the individual as well as additional pro-inflammatory stimulus (interferon-gamma priming). We will acquire 100 human induced pluripotent stem cell lines from the HipSci project, differentiate the cells in vitro into macrophages and expose them to four environmental conditions: (i) no stimulation, (ii) interferon-gamma (18h), (iii) Salmonella typhimurium SL1344 (5h), (iv) interferon-gamma (18h) + Salmonella (5h).Subsequently, we will isolate RNA from the samples for sequencing. Illumina HiSeq 2500;ILLUMINA 236
EGAD00001000438 UK10K_NEURO_EDINBURGH REL-2013-04-20 Illumina HiSeq 2000; 234 vcf
EGAD00010000484 ccRCC control samples using 250K Nsp Affymetrix_250K(Nsp) - gtype 234
EGAD00001001006 Dataset for whole exome sequencing of 113 pairs of tumor and normal DNA samples along with 8 cell lines. Illumina HiSeq 2000; 234 fastq
EGAD00001000880 233 bam,vcf
EGAD00001000360 The genome-wide landscape of somatically acquired mutations in mesothelioma has not been deeply characterised to date, but advances in DNA sequencing technology now allow this to be addressed comprehensively. Harnessing massively parallel DNA sequencing platforms, we will identify somatically acquired point mutations in all coding regions of the genome from patients with mesothelioma. In addition, using paired-end sequencing, we will map copy number changes and genomic rearrangements from the same patients. Illumina HiSeq 2000; 232 bam,cram
EGAD00010000391 Cambridge control samples using a 660K genotyping chip from Illumina Illumina Human 660K Quad BeadChips - Illuminus 232
EGAD00001002068 The dataset consists of 232 RNA-seq samples (whole blood) obtained from healthy female from the TwinsUK adult registry cohort. The samples were obtained at two time points separated on average by 22 months. Illumina HiSeq 2000; 232 bam,phenotype_file
EGAD00001003174 There are 116 liver cancer cases in this study and belong to LICA-CN project Illumina HiSeq 2000;ILLUMINA 232
EGAD00001002151 Whole transcriptome sequencing of 231 children with newly-diagnosed ALL Illumina HiSeq 2000;ILLUMINA 231 fastq
EGAD00010000616 HumanOmni1-Quad genotyping array 230
EGAD00010001283 Illumina HumanOmni5-Quad BeadChips Illumina 229
EGAD00010000871 CLL and normal B cell samples using 450K 226
EGAD00001003125 WGS data of medulloblastoma tumor/control pairs. 224
EGAD00001001264 We propose to definitively characterise the somatic genetics of ER+ve, HER2-ve breast cancer through generation of comprehensive catalogues of somatic mutations in 500 cases by high coverage genome sequencing coupled with integrated transcriptomic and methylation analyses. Illumina HiSeq 2000; 223 cram
EGAD00010000610 Samples from the Greek island of Crete, MANOLIS cohort 221
EGAD00001001360 The majority of neuroblastoma patients have tumors that initially respond to chemotherapy, but a large proportion of patients will experience therapy-resistant relapses. The molecular basis of this aggressive phenotype is unknown. Whole genome sequencing of 23 paired diagnostic and relapsed neuroblastomas showed clonal evolution from the diagnostic tumor with a median of 29 somatic mutations unique to the relapse sample. Eighteen of the 23 relapse tumors (78%) showed RAS-MAPK pathway mutations. Seven events were detected only in the relapse tumor while the others showed clonal enrichment. In neuroblastoma cell lines we also detected a high frequency of activating mutations in the RAS-MAPK pathway (11/18, 61%) and these lesions predicted for sensitivity to MEK inhibition in vitro and in vivo. Our findings provide a rationale for genetic characterization of relapse neuroblastoma and show that RAS-MAPK pathway mutations may function as a biomarker for new therapeutic approaches to refractory disease. 221 other,vcf
EGAD00001003225 ICGC prostate UK study batches 4-6 prostatectomy analysis. Whole genome sequenced normal (blood) and malignant tissue pair of 111 patients. Illumina HiSeq 2000;ILLUMINA 221
EGAD00001000127 Burden of Disease in Sarcoma Illumina HiSeq 2000, Illumina HiSeq 2000; 220 bam,cram
EGAD00001003296 Integrated callset of low coverage Ethiopian and Egyptian genomes from the Pagani et al. 2015 AJHG paper (doi: http://dx.doi.org/10.1016/j.ajhg.2015.04.019) 220
EGAD00001000233 EGAD00001000233_UK10K_NEURO_EDINBURGH_REL_2012_07_05 Illumina HiSeq 2000; 219 vcf
EGAD00010000472 CLL Expression Array Affymetrix U219 219
EGAD00001001108 MMP-seq tumor samples (FASTQ) Illumina Genome Analyzer IIx; 218 fastq
EGAD00010000580 Gencode control samples using 550K 217
EGAD00001002704 DATA FILES FOR MULLIGHAN MEF2D RNASEQ UNSTRANDED Illumina HiSeq 2000;ILLUMINA 217 bam
EGAD00001003592 Merged bam files for PACA-CA Whole Exome Sequencing, for DCC release 25 216
EGAD00001001873 AML emerges as a consequence of accumulating independent genetic aberrations that direct regulation and/or dysfunction of genes resulting in aberrant activation of signalling pathways, resistance to apoptosis and uncontrolled proliferation. Given the significant heterogeneity of AML genomes, AML patients demonstrate a highly variable response rate and poor median survival in response to current chemotherapy regimens. For the past 4 years we have conducted gene expression profiling on purified bone marrow populations equating to normal haematopoietic stem and progenitor cells from healthy subjects and patients with de novo AML in order to identify AML signatures of aberrantly expressed genes in cancer versus normal. We are now applying a series of bioinformatic methodologies combined with clinical and conventional diagnostic data to establish novel genomics strategies for improved prognostication of AML. Additionally, we use our AML signatures to unravel oncogenic signalling pathway activities in AML patients and test inhibitory drugs for these pathways inn preclinical therapeutic programmes. We consider that superimposing GEP and clinical data for our AML patient cohort with additional data on their mutational status will significantly improve the prognostic power of the study as well as unravel yet unknown mutations associated with aberrant signalling activities of oncogenic pathways. Illumina HiSeq 2000; 215 cram
EGAD00001000317 UK10K_NEURO_EDINBURGH REL-2012-11-27 Illumina HiSeq 2000; 214 vcf
EGAD00010000423 Han Chinese samples using Illumina OMNIExpress (controls) Illumina OMNIExpress 213
EGAD00001000446 Fastq files of 213 samples of hepatocellular carcinoma (NCCRI) Illumina HiSeq 2000; 213 fastq
EGAD00001000181 UK10K_OBESITY_SCOOP REL-2012-01-13 Illumina HiSeq 2000; 212 vcf
EGAD00001000044 Recurrent Somatic Mutations in CLL Illumina Genome Analyzer IIx 212 fastq
EGAD00001000597 Illumina HiSeq 2000; 212 bam
EGAD00001002671 RNA-Seq data for 212 CD4-positive, alpha-beta T cell sample(s). 212 run(s), 212 experiment(s), 212 analysis(s) on human genome GRCh37. Analysis documentation available at http://ftp.ebi.ac.uk/pub/databases/blueprint/blueprint_Epivar/protocols/README_rnaseq_analysis_sanger_20160816 Illumina HiSeq 2000;ILLUMINA, Illumina HiSeq 2500;ILLUMINA 212 bam,fastq
EGAD00001003551 The samples include paired tumor and normal tissues from 106 patients . High-coverage WES sequencing or whole genome sequencing of DNA samples were performed on the Illumina HiSeq 2000 system Illumina HiSeq 2000;ILLUMINA 212
EGAD00001001915 RNA-Seq data for Mesothelioma. Illumina HiSeq 2000; 211 fastq
EGAD00001002773 As part of the International Parkinson Disease Genomics Consortium, exomes of Parkinson disease (PD) patients and healthy controls were sequenced to study the genetic etiology of PD. The Dutch cohort consists of 175 patients with a young age of onset below 50 years. Researchers can apply for access to fastq files for this cohort. Illumina HiSeq 2000;ILLUMINA 211 fastq
EGAD00001003591 Merged bam files for PACA-CA Whole Genome Sequencing, for DCC release 25 211
EGAD00001001084 Illumina HiSeq 2000; 209 fastq
EGAD00001001625 release_2: ICGC PedBrain: whole genome sequencing Illumina HiSeq 2000;, Illumina Genome Analyzer IIx; 209 bam,fastq
EGAD00001001691 Esophageal cancer is one of the most aggressive cancers and the sixth leading cause of cancer death worldwide1. Approximately 70% of the global esophageal cancers occur in China and over 90% histopathological forms of this disease are esophageal squamous cell carcinoma (ESCC)2-3. Currently, there are limited clinical approaches for early diagnosis and treatment for ESCC, resulting in a 10% 5-year survival rate for the patients. Meanwhile, the full repertoire of genomic events leading to the pathogenesis of ESCC remains unclear. Here we show a comprehensive genomic analysis in 158 ESCC cases, as part of the International Cancer Genome Consortium (ICGC) Research Projects (http://icgc.org/icgc/cgp/72/371/1001734). We conducted whole-genome sequencing in 14 ESCC cases and whole-exome sequencing in 90 cases. Illumina HiSeq 2000; 208 fastq
EGAD00001001916 Targeted sequencing using SPET for Mesothelioma. Illumina HiSeq 2000; 207 fastq
EGAD00001000122 DATA_SET_ICGC_PedBrainTumor_Medulloblastoma Illumina HiSeq 2000, Illumina Genome Analyzer IIx 206 bam
EGAD00001002675 RNA-Seq data for 205 mature neutrophil sample(s). 205 run(s), 205 experiment(s), 205 analysis(s) on human genome GRCh37. Analysis documentation available at http://ftp.ebi.ac.uk/pub/databases/blueprint/blueprint_Epivar/protocols/README_rnaseq_analysis_sanger_20160816 Illumina HiSeq 2000;ILLUMINA, Illumina HiSeq 2500;ILLUMINA 205 bam,fastq
EGAD00001000202 Neuroblastoma samples (Analyses_vcf files) 204 vcf
EGAD00001000428 204 individuals were genotyped with the Illumina 2.5M Omni chip. Filtered genotypes were imputed into the 1000 genomes project European panel SNPs. Beagle R2 is indicated in VCF files for further filtering. See Materials and Methods in publication for details. 204 vcf
EGAD00001000196 Neuroblastoma samples Complete Genomics; 203 CompleteGenomics_native
EGAD00001002123 ICGC PCAWG Dataset for WGS BAM aligned using BWA MEM. Project: MALY-DE. 202 readme_file,bai,bam
EGAD00001000323 Sequencing data for Australian Pancreatic Cancer study submitted 20130102 AB SOLiD 4 System;, Illumina HiSeq 2000; 200 bam
EGAD00001000272 Genomic Alterations in Gingivo-buccal Cancer: ICGC-India Project_YR01 454 GS FLX Titanium;, Illumina HiSeq 2000; 200 bam
EGAD00001001071 Samples from the "100" project that are in the ICGC PanCancer project. Illumina HiSeq 2000; 200 bam
EGAD00001001051 Illumina HiSeq 2000; 200 fastq
EGAD00001002692 DATA FILES FOR MULLIGHAN MEF2D RNASEQ STRANDED Illumina HiSeq 2000;ILLUMINA 200 bam
EGAD00010001131 The 100 European-descent (EUB) and 100 African-descent (AFB) Belgians studied were genotyped for a total of 4,301,332 SNPs on the Illumina HumanOmni5-Quad BeadChips. Whole-exome sequencing was carried out for the same 200 individuals with the Nextera Rapid Capture Expanded Exome kit, on the Illumina HiSeq 2000 platform, with 100-bp paired-end reads. This kit delivers 62 Mb of genomic content per individual, including exons, untranslated regions (UTR), and microRNAs. Omni5 and exome datasets were merged, yielding a concordance rate between platforms of 99.93%. Illumina HumanOmni5-Quad and exome sequencing 200
EGAD00001001100 DCC Project Code: SKCA-BR Skin Adenocarcinoma - BR Brazil Illumina HiSeq 2500;ILLUMINA 200
EGAD00001000133 The landscape of cancer genes and mutational processes in breast cancer Illumina HiSeq 2000, Illumina Genome Analyzer II 199 bam
EGAD00001000728 Low coverage whole genome sequencing of samples from individuals from Friuli Venezia Giulia, an Italian genetic isolate population. Illumina HiSeq 2000; 199 bam
EGAD00001001443 RNASeq sequencing. Each library was sequenced using TruSeq SBS Kit v3-HS, in paired-end mode with a read length of 2 × 76 bp. We generated more than 20 million paired-end reads for each sample in a fraction of a sequencing lane on HiSeq2000 (Illumina Inc.) following the manufacturer’s protocol. Image analysis, base calling and quality scoring of the run were processed using the manufacturer’s software Real Time Analysis (RTA 1.13.48) and followed by generation of FASTQ sequence files. Illumina Genome Analyzer II; 199 fastq
EGAD00001001853 In this dataset are the data from : - 17 patients studied by WGS - 49 patients studied by WES - 9 (/49) patients studied by RNASeq at 2 time points - the same 9 patients studied by ERRBS at 2 time points Illumina HiSeq 2000; 199 fastq
EGAD00001000758 dataset for BGI bladder cancer project Illumina Genome Analyzer II; 198 fastq
EGAD00001002156 ICGC PCAWG Dataset for WGS BAM aligned using BWA MEM. Project: ESAD-UK. 198 readme_file,bai,bam
EGAD00001001913 Exome sequencing data for Mesothelioma Illumina HiSeq 2500; 198 fastq
EGAD00001002663 BLUEPRINT: A human variation panel of genetic influences on epigenomes and transcriptomes in three immune cells (WGS) Illumina HiSeq 2000;ILLUMINA 197 vcf,cram
EGAD00001002674 RNA-Seq data for 197 CD14-positive, CD16-negative classical monocyte sample(s). 197 run(s), 197 experiment(s), 197 analysis(s) on human genome GRCh37. Analysis documentation available at http://ftp.ebi.ac.uk/pub/databases/blueprint/blueprint_Epivar/protocols/README_rnaseq_analysis_sanger_20160816 Illumina HiSeq 2000;ILLUMINA, Illumina HiSeq 2500;ILLUMINA 197 bam,fastq
EGAD00001001270 Illumina HiSeq 2000; 196 bam
EGAD00001001322 A comprehensive characterisation and analysis of human breast cancers through whole-genome sequencing. Illumina HiSeq 2000; 196 bam
EGAD00001002684 Whole genome sequencing of 98 tumour-normal pairs for the PAEN-AU pancreatic neuroendocrine cancer project. 196 bam
EGAD00001003804 Exome fastq files of 98 hepatocellular carcinoma and matched nomral (BCM, HCC-JP) Illumina HiSeq 2000;ILLUMINA 196
EGAD00001002112 RNA-seq data from 195 pediatric BCP-ALL cases. Alignment: TopHat 2.0.7. Reference genome: hg19. Illumina HiScanSQ; 195 fastq
EGAD00001002130 ICGC PCAWG Dataset for WGS BAM aligned using BWA MEM. Project: CLLE-ES. 194 readme_file,bai,bam
EGAD00001001387 Using high-throughput sequencing technologies and analytical tools, we conduct an exome sequencing study that will help understand the population genetics of a Croatian island isolate, in a sample of 200 subjects from the Adriatic island of Vis who were selected to reflect islanders with at least four known ancestors in grandparental line who are original islanders. Illumina HiSeq 2000; 193 bam
EGAD00001003131 The dataset consists of two main sample groups. 1) The inter-tumour sample group contains a total of 97 samples from 27 patients. Each patient has a single normal and primary sample as well as one or more metastases. All samples were sequenced using IonTorrent PGM and a custom colorectal cancer (CRC) panel. 2) The intra-tumour sample group contains a total of 68 samples from a single tumour as well as a normal tissue sample. All 68 samples were sequenced using IonTorrent PGM and a custom CRC panel. Shallow whole genome sequencing was additionally applied to 10 of the samples using Illumina HiSeq 4000. Ion Torrent PGM;ION_TORRENT, Illumina HiSeq 4000;ILLUMINA 193
EGAD00001003529 HipSci - Healthy Normals - RNA Sequencing - July 2017 Illumina HiSeq 2500 (ILLUMINA) 193
EGAD00001000616 Pilocytic Astrocytoma ICGC PedBrain whole genome sequencing Illumina HiSeq 2000; 192 bam
EGAD00010000425 Han Chinese samples using Immunochip HanChinese_Immunochip 192
EGAD00001000979 We are developing a protocol to differentiate mouse and human induced pluripotent stem (IPS) and embryonic stem (ES) cells towards the haematopoietic pathway to generate erythrocytes in vitro. This system has many applications such as the study of the role of specific genes and human polymorphisms in infectious diseases such as malaria, as well as haematological diseases such as myelodysplastic syndrome. The nature of the in vitro differentiation process means that a heterogeneous population of cells is generated. In order to understand the types of cells produced with our protocol, we have performed a single cell analysis, which has the power to reveal the different populations of cells and their characteristics. For this, a cDNA library has been made that needs to be sequenced to obtain the gene expression profiles of the different cells. With this information we will be able to assess the quality of the differentiation protocol and improve it in order to produce better cells for the downstream applications. This data is part of a pre-publication release. For information on the proper use of pre-publication data shared by the Wellcome Trust Sanger Institute (including details of any publication moratoria), please see http://www.sanger.ac.uk/datasharing/ Illumina HiSeq 2500; 192 cram
EGAD00001002132 ICGC PCAWG Dataset for WGS BAM aligned using BWA MEM. Project: PACA-AU. 192 readme_file,bai,bam
EGAD00001003148 Microfluidic direct library preparation (DLP) single-cell whole-genome BAM files for near-diploid immortalized lymphoblastoid cell line GM18507. NextSeq 500;ILLUMINA 192
EGAD00001003152 Microfluidic direct library preparation (DLP) single-cell whole-genome BAM files for near-diploid immortalized breast epithelial cell line 184-hTERT-L2. Illumina HiSeq 2500;ILLUMINA 192
EGAD00001000265 This Study uses a focused bespoke bait pull down library method to target findings of Chondrosarcoma whole genome and whole exome sequencing studies in order to validate findings. This method will also be used on a larger set of tumour only samples in order to find precedence of these findings in a larger set of patient samples. Illumina HiSeq 2000;ILLUMINA 190
EGAD00001001116 We propose to definitively characterise the somatic genetics of Prostate cancer through generation of comprehensive catalogues of somatic mutations by high coverage genome sequencing. See ICGC website for more information: http://icgc.org/icgc/cgp/70/508/71331 Illumina HiSeq 2000; 190 bam
EGAD00001000702 Complete set of bam files associated with study EGAS00001000622 190 bam
EGAD00001002131 ICGC PCAWG Dataset for WGS BAM aligned using BWA MEM. Project: RECA-EU. 190 bam,bai,readme_file
EGAD00001003264 ICGC DCC Release 24, PACA-CA Exome sequence 190
EGAD00001000782 Whole-genome sequencing was performed by Illumina Inc (San Diego, CA). Libraries were constructed with ~300bp insert length and paired-end 100bp reads were sequenced on Illumina HiSeq2000. Illumina HiSeq 2000;ILLUMINA 190
EGAD00001000129 Essential Thrombocythemia Myeloproliferative Disease exome sequencing Illumina HiSeq 2000, Illumina HiSeq 2000; 189 bam
EGAD00010000387 Cambridge control samples using a 1.2M genotyping chip from Illumina Illumina Human 1.2M Duo custom BeadChips v1 - Genome Studio 188
EGAD00001001037 A total of 395 couples were subjected to IVF-PGD treatment, including 129 couples with NGS-based test and 266 couples with SNP array based test for the detection of embryonic chromosomal abnormalities. The NGS test was performed using low coverage whole genome sequencing with HiSeq 2000 platform. And the SNP array test was using Affymetrix Gene Chip Mapping Nsp I 262K. The average age of patients was 32.1 years (age range 20-44 years). Illumina HiSeq 2000; 188 fastq
EGAD00001001066 Dynamics of genomic clones in breast cancer patient xenografts at single cell resolution Illumina MiSeq;, Illumina HiSeq 2000; 188 bam
EGAD00001001624 release_2: ICGC PedBrain: whole exome sequencing and Target-Seq Illumina HiSeq 2000; 188 bam
EGAD00001003220 Whole genome, whole exome, and custom panel sequencing of high-grade meningioma cohort 188
EGAD00010000421 Han Chinese samples using Affymetrix (controls) Affymetrix_6.0 187
EGAD00001000079 PREDICT Illumina HiSeq 2000, Illumina HiSeq 2000; 186 bam,cram
EGAD00001001256 Clonal hematopoiesis was investigated in patients with aplastic anemia using next-generation sequencing and single-nucleotide polymorphism (SNP) array-based karyotyping. Illumina HiSeq 2000; 186 bam
EGAD00001001849 The genomic sequence of brain expressed miRNA genes was sequenced in Swedish schizophrenia patients Illumina MiSeq; 186 fastq
EGAD00001002670 ChIP-Seq data for 182 mature neutrophil sample(s). 2847 run(s), 366 experiment(s), 355 analysis(s) on human genome GRCh37. Analysis documentation available at http://ftp.ebi.ac.uk/pub/databases/blueprint/blueprint_Epivar/protocols/README_chipseq_analysis_ebi_20160816 Illumina HiSeq 2000;ILLUMINA, Illumina HiSeq 2500;ILLUMINA 186 fastq,cram,bam
EGAD00010000889 Gencode control samples using SNP6.0 SNP6.0 183
EGAD00001002179 Background: A rare subgroup of HIV infected individuals naturally controls infection without treatment. These ?elite controllers? constitute an important model for the natural control of HIV infection. Indeed, the study of these individuals may provide insights into strategies for the development of HIV vaccines. Although several HLA and chemokine alleles are known to be over-represented in elite controllers, only a small portion of HIV phenotypic variation is explained by known genetic variants. The elite controller phenotype is rare and distinct, representing the extreme of an infectious disease trait. As such, this phenotype may be partly explained by variation in host immune control, which may be characterized by differences in rare functional genetic variants. Genomic regions underlying elite control can be potentially identified by comparing the presence or frequency of variants in this group to that representing the opposite extreme. In this context, ?rapid progressors? is a group defined by its rapid immunological and clinical disease progression. Aim: To extend an existing study, in order to identify DNA sequence variants involved in the control of HIV infection with greater statistical resolution. Specifically, we aim to sequence up to 200 exomes from multiple cohort studies within the EuroCoord CASCADE collaboration (a collaboration of 25 HIV seroconversion cohort studies across Europe). Illumina HiSeq 2000; 183 cram
EGAD00001001693 Fastq files of RNAseq of 182 samples of biliary tract cancer Illumina HiSeq 2000; 182 fastq
EGAD00001001690 Tumor-Normal paired samples of PTC Illumina HiSeq 2000; 182 fastq
EGAD00001001933 HipSci - Healthy Normals - RNA Sequencing - January 2016 Illumina HiSeq 2000;ILLUMINA 181 other,cram,bai,bam
EGAD00010001139 HipSci - Healthy Normals - Methylation Array - October 2016 Illumina 181
EGAD00000000033 NcOEDG Helsinki 2 samples Illumina HumanHap 300 180
EGAD00001000110 Breast Cancer Exome Sequencing Illumina HiSeq 2000, Illumina Genome Analyzer II 179 bam
EGAD00001001973 Exome sequencing of 184 samples from consanguineous families with different congenital heart defects collected at KAIMRC, Riyadh, Saudi Arabia. Illumina HiSeq 2000;, Illumina HiSeq 2500; 179 cram
EGAD00001003425 A EGFR mutant NSCLC cell line which is sensitive to AZD9291 inhibition was mutagenised with the chemical mutagen ENU and then drug selected using a AZD9291. Single cell derived colonies were then manually picked and expanded in drug. Resistance was confirmed in a 14 day assay and DNA was collected. These then underwent targeted amplicon-based sequencing to confirm candidate resistance effectors hypothesised from currently available literature. This dataset contains all the data available for this study on 2017-07-05. Illumina MiSeq;ILLUMINA 177
EGAD00000000027 eQTL data for European newborns Ilumina HumanHap550-2v3_B-Beadstudio 176
EGAD00001000760 dataset for esophageal cancer, 17pairs for whole-genome sequencing and 71pairs for whole-exome sequencing Illumina HiSeq 2000; 176 fastq
EGAD00001003550 Cell line exome sequencing Illumina HiSeq 2500;ILLUMINA 176
EGAD00001003760 There are 88 paired samples from HCC patients including tumors and matched adjacent normal tissues which were sequencing by Illumina HiSeq 2000 platform. Illumina HiSeq 2000;ILLUMINA 176
EGAD00001000443 UK10K_NEURO_MUIR REL-2013-04-20 Illumina Genome Analyzer II;, Illumina HiSeq 2000; 175 vcf
EGAD00001001000 Background: The disease course of patients with diffuse low-grade glioma is notoriously unpredictable. Temporal and spatially distinct samples may provide insight into the evolution of clinically relevant copy number aberrations (CNAs). The purpose of this study is to identify CNAs that are indicative of aggressive tumor behaviour and can thereby complement the prognostically favorable 1p/19q co-deletion. Results: Genome-wide, 50 base pair single-end, sequencing was performed to detect CNAs in a clinically well-characterized cohort of 98 formalin-fixed paraffin-embedded low-grade gliomas. CNAs are correlated with overall survival as an endpoint. Seventy-five additional samples from spatially distinct regions and paired recurrent tumors of the discovery cohort were analysed to interrogate the intratumoral heterogeneity and spatial evolution. Loss of 10q25.2-qter is a frequent subclonal event and significantly correlates with an unfavorable prognosis. A significant correlation is furthermore observed in a validation set of 126 and confirmation set of 184 patients. Loss of 10q25.2-qter arises in a longitudinal manner in paired recurrent tumor specimens, whereas the prognostically favorable 1p/ 19q co-deletion is the only CNA that is stable across spatial regions and recurrent tumors. Conclusions: CNAs in low-grade gliomas display extensive intratumoral heterogeneity. Distal loss of 10q is a late onset event and a marker for reduced overall survival in low-grade glioma patients. Intratumoral heterogeneity and higher frequencies of distal 10q loss in recurrences suggest this event is involved in outgrowth to the recurrent tumor. Illumina HiSeq 2000; 175 fastq
EGAD00001003300 Paired-end reads were aligned to human reference genome build hg19 by BWA. Single nucleotide variants and small insertions/deletions were called by GATK resulting in a single VCF file including all 176 samples. 175
EGAD00001000074 Integrative Oncogenomics of Multiple Myeloma Illumina HiSeq 2000, Illumina Genome Analyzer II 174 bam,srf
EGAD00001002672 ChIP-Seq data for 172 CD14-positive, CD16-negative classical monocyte sample(s). 572 run(s), 345 experiment(s), 340 analysis(s) on human genome GRCh37. Analysis documentation available at http://ftp.ebi.ac.uk/pub/databases/blueprint/blueprint_Epivar/protocols/README_chipseq_analysis_ebi_20160816 Illumina HiSeq 2000;ILLUMINA, Illumina HiSeq 2500;ILLUMINA 174 bam,fastq
EGAD00010001063 blood-based gene expression from breast cancer cases IlluminaHuman AWG-6 and HT12 173
EGAD00010001064 tumor-based gene expression from breast cancer cases IlluminaHuman HT12 173
EGAD00001000442 UK10K_NEURO_IOP_COLLIER REL-2013-04-20 Illumina HiSeq 2000; 172 vcf
EGAD00001003235 Raw exome sequence data(fastq) for the GATCI project unspecified;ILLUMINA 172
EGAD00001001960 upcoming publication Illumina HiSeq 2000; 171 bam
EGAD00001003279 RNA sequencing data for 170 medulloblastoma tumor samples Illumina HiSeq 2000;ILLUMINA 171
EGAD00001000235 EGAD00001000235_UK10K_NEURO_IOP_COLLIER_REL_2012_07_05 Illumina HiSeq 2000; 170 vcf
EGAD00001000949 Validations of variants identified by exome sequencing in sequential samples derived after treatment cycle with AZA. Illumina HiSeq 2000; 170 cram
EGAD00010000371 Case and control samples (Genotypes) Infinium_370k - GenomeStudio 170
EGAD00001003400 We present targeted NGS panel data from 170 samples that were processed using the TruSightTM Cancer (TSC) panel (Illumina, San Diego, CA, USA), which targets 94 genes and 284 SNPs associated with a predisposition towards cancer. The samples are enriched for CNVs in the genes of interest. All CNVs have previously been assessed with MLPA and can therefore be considered as confirmed. Illumina MiSeq;ILLUMINA 170
EGAD00001001375 Samples will be from the BRF113683 (BREAK-3) study which is a Phase III Randomized, Open-label Study Comparing GSK2118436 to Dacarbazine (DTIC) in Previously Untreated Subjects With BRAF Mutation Positive Advanced (Stage III) or Metastatic (Stage IV) Melanoma (n=250 enrolled) *NGS [Agilent capture (Sanger V2 panel): 360 genes and 20 gene fusions; Illumina HiSEQ Sequencing] *CNV: [via NGS or Affy SNP 6.0 or Illumina Omni (TBD)] Bioinformatics: Analysis will be performed using core Sanger informatics pipelines similar to those previously described (Papaemmanuil E et al. (2013) Blood. 22:3616 -3627). Briefly, copy number analysis will be performed using the ASCAT algorithm, and base substitutions, small insertions and deletions using the CAVEMAN and Pindel algorithms, respectively. Statistical approaches including generalized linear models will be used to predict clinical variables such as maximum clinical response and duration of response using genetic data. Sanger and EBI to conduct analysis; Raw data and correlation with clinical endpoints to be analyzed by both EBI/Sanger and GSK (unique pipeline analyses to increase call confidence) Illumina HiSeq 2500; 169 cram
EGAD00001001118 Gastric Cancer (GC) is a highly heterogeneous disease. To identify potential clinically actionable therapeutic targets that may inform individualized treatment strategies, we performed whole-exome sequencing on 78 GCs of differing histologies and anatomic locations, as well as whole-genome sequencing on two GC cases, each with 3 primary tumours and 2 matching lymph node metastases. The data showed two distinct GC subtypes with either high-clonality (HiC) or low-clonality (LoC). Illumina HiSeq 2000; 168 fastq
EGAD00001000170 UK10K_NEURO_MUIR REL-2012-01-13 Illumina Genome Analyzer II;, Illumina HiSeq 2000; 167 vcf
EGAD00001000236 EGAD00001000236_UK10K_NEURO_MUIR_REL_2012_07_05 Illumina Genome Analyzer II;, Illumina HiSeq 2000; 167 vcf
EGAD00001000796 This project aims to study at least 90 exomes from families with congenital heart disease. The samples have been selected in Leuven in collaboration with Koen Devriendt. Ethic approval has been sought for in Leuven, Belgium and a HDMMC agreement for submitting these samples is in place at the WTSI. The phenotype we wil primarily focus our analysis is severe Left Ventricular Outflow Tract Obstructions (LVOTO) and Atrioventricular Septal Defect (AVSD). The indexed Agilent whole exome pulldown libraries will be sequenced on 75bp PE HiSeq (Illumina). Illumina HiSeq 2000; 167 bam
EGAD00001000322 UK10K_NEURO_MUIR REL-2012-11-27 Illumina Genome Analyzer II;, Illumina HiSeq 2000; 166 vcf
EGAD00001000096 Pancreatic adenocarcinoma QCMG 20120201 AB SOLiD 4 System 166 bam
EGAD00001001090 This study aims to define the landscape of somatic mutations in sun exposed human skin by deep sequencing, analyse their frequency and use the data to infer the effect of mutations on proliferating cell behaviour. The frequency of each mutation will reflect the size of the clone of cells in the tissue sample. By analyzing small samples, clones with as few as 100 cells will be detectable. Allele frequency distributions for each mutation will be used to infer cell fate using published methods (Klein et al. 2010). This study will shed unprecedented light on the early clonal events that lead to the emergence of cancer. Illumina HiSeq 2000; 166 bam,cram
EGAD00010000254 CLL Methylation Arrays Illumina HumanMethylation450 165
EGAD00001003121 Dataset is composed of FASTQ files from 165 samples of small round cell sarcomas which were RNA-sequenced (whole transcriptome) with either Illumina HiSeq 2500 (120 million reads per sample, paired-end 100 pb) or Illumina NextSeq 500 (110 million reads per sample, paired-end 150) Illumina HiSeq 2500;ILLUMINA, NextSeq 500;ILLUMINA 165
EGAD00001002740 We propose to definitively characterise the somatic genetics of breast cancer through generation of comprehensive catalogues of somatic mutations in breast cancer cases by high coverage genome sequencing. Illumina MiSeq;ILLUMINA, Illumina HiSeq 2000;ILLUMINA, Illumina HiSeq 2500;ILLUMINA 164
EGAD00001003445 Clear cell renal cancer is characterized by near-universal loss of the short arm of chromosome 3 (3p). This event arises through unknown mechanisms, but critically results in the loss of several tumor suppressor genes. We analyzed whole genomes from 95 biopsies across 33 patients with clear cell renal cancer (ccRCC) recruited into the Renal TRACERx study. We find novel hotspots of point mutations in the 5'-UTR of TERT, targeting a MYC-MAX repressor, that result in telomere lengthening. The most common structural abnormality generates simultaneous 3p loss and 5q gain (36% patients), typically through chromothripsis. Using molecular clocks, we estimate this occurs in childhood or adolescence, generally preceding emergence of the most recent common ancestor by years to decades. Similar genomic changes recent common ancestor by years to decades. Similar genomic changes are seen in inherited kidney cancers. Modeling differences in age-incidence between inherited and sporadic cancers suggests that the number of cells with 3p loss capable of initiating sporadic tumors is no more than a few hundred. Targeting essential genes in deleted regions of chromosome 3p could represent a potential preventative strategy for renal cancer. HiSeq X Ten;ILLUMINA 164
EGAD00001001851 The genomic sequence of brain expressed miRNA genes was sequenced in Belgian epilepsy patients. Illumina MiSeq; 163 fastq
EGAD00000000040 GenomEUtwin Danish (DK) samples Illumina HumanHap 300 162
EGAD00001001410 Whole-exome sequencing of 81 tumor/normal pairs of adult T-cell leukemia/lymphoma Illumina HiSeq 2000; 162 bam
EGAD00010001433 Illumina HumanMethylation450 BeadChip Illumina 162
EGAD00010000943 Sahel population study using 2.5M Illumina HumanOmni2.5 161
EGAD00010000690 Genome-wide SNP genotyping of African rainforest hunter-gatherers and neighbouring agriculturalists by Illumina HumanOmniExpress 160
EGAD00001001984 To identify recurrent somatic alterations in this unique subset of gastric cancers, whole exome and SNP6 analyses were performed using frozen cancer tissue. The somatic mutation analyses were also performed using blood of the same patients. Illumina HiSeq 2500; 160 bam
EGAD00001000321 UK10K_NEURO_IOP_COLLIER REL-2012-11-27 Illumina HiSeq 2000; 158 vcf
EGAD00001001341 We propose to definitively characterise the somatic genetics of breast cancer through generation of comprehensive catalogues of somatic mutations in breast cancer cases by high coverage genome sequencing coupled with integrated transcriptomic and methylation analyses. Illumina HiSeq 2000; 158 cram
EGAD00001002129 ICGC PCAWG Dataset for WGS BAM aligned using BWA MEM. Project: BRCA-EU. 158 readme_file,bai,bam
EGAD00001002673 ChIP-Seq data for 154 CD4-positive, alpha-beta T cell sample(s). 355 run(s), 265 experiment(s), 250 analysis(s) on human genome GRCh37. Analysis documentation available at http://ftp.ebi.ac.uk/pub/databases/blueprint/blueprint_Epivar/protocols/README_chipseq_analysis_ebi_20160816 Illumina HiSeq 2000;ILLUMINA, Illumina HiSeq 2500;ILLUMINA 158 bam,fastq
EGAD00001002197 Recent GWAS studies have made extensive use of large eQTL data sets to functionally annotate index SNPs. With a large number of association signals located outside coding regions there has been an intense search among sequence variants affecting gene expression at the transcriptional level. However, little progress has been made in mapping regulatory variants that affect protein levels at the translational or post-translational level. It is now possible to undertake a protein QTL scan for focused sets of e.g. oxidized proteins by mass spectrometry. We have established a collaboration with a longitudinal, family-based study in France, the Stanislas cohort, which comprises circa 1000 nuclear families (4,295 individuals) and has follow up data for 10 years (three visits). We have undertaken a pilot study in a focus set of 257 subjects from 79 families with the aim to integrate GWAS, transcriptomic and DNA methylation data with proteomic data on a set of 100 proteins measured in PBMCs. We have already generated GWAS data using Illumina's core-exome chip as well as DNA methylation profiles with the 450K array. We propose to use RNA seq to generate transcriptomic data of the corresponding PBMCs. This data is part of a pre-publication release. For information on the proper use of pre-publication data shared by the Wellcome Trust Sanger Institute (including details of any publication moratoria), please see http://www.sanger.ac.uk/datasharing/ Illumina HiSeq 2500; 155 cram
EGAD00001002205 The BLUEPRINT project is a large-scale project investigating epigenetic mechanisms involved in blood formation, in health and disease. The human variation workpackage (WP10, led by NS) of the project seeks to characterize the effect of common sequence variation on the epigenome status of a cell. To do this, the project will use highly purified blood cells to minimise "experimental noise" and therefore enhance the power to discover modest effects. Two peripheral blood cell types, the CD14+CD16- monocyte (an important central orchestrator of adaptive immunity and a bridge between innate and adaptive immunity) and the CD65+CD9- neutrophilic granulocyte (the frontline cell for innate immunity) have been selected for this purpose. The two types of cells will be obtained at high purity from adult blood (AB) of 200 healthy males and females, respectively. Cells will be purified by using already validated and fully operational protocols that are based on density gradient centrifugation of the buffy coat obtained from whole blood, followed by magnetic bead-based purification using monoclonal antibodies against Cluster of Differentiation (CD) lineage-specific cell surface markers. Units of 475 ml of AB will be obtained from consenting volunteers of the Cambridge BioResource (CBR), a panel of 10,000 healthy volunteers local to Cambridge who have already consented to participate in biomedical research and of whom biological samples (DNA, plasma, serum) and lifestyle data have been deposited in a repository and database, respectively. We are requesting funding from the Human Diversity project to sequence the genomes of the 200 CBR volunteers at low pass (6x coverage). Nuclei, DNA and RNA will be recovered from the purified cells and made available for RNA-seq, DNA-seq and ChIP-seq and genomic DNA for entire genome sequencing will be recovered from the DNA repository. Illumina HiSeq 2000;, Illumina HiSeq 2500; 155 cram
EGAD00001000339 Multiple myeloma is an incurable plasma cell malignancy whose molecular pathogenesis is incompletely understood. We used whole exome sequencing, copy number profiling and cytogenetic to analyses 84 samples from 67 patients with myeloma. In addition to known myeloma genes, we identify new candidate genes, including truncations of SP140, ROBO1 and FAT3 and clustered missense mutations in EGR1. We find oncogenic mutations in cancer genes not previously implicated in myeloma, including SF3B1, PI3KCA and PTEN. We define diverse processes contributing to the mutational repertoire, including kataegis and somatic hypermutation. Most cases have at least one cluster of subclonal variants, including subclonal driver mutations, implying on-going tumor evolution. Serial samples revealed diverse patterns of clonal evolution, including linear evolution, differential clonal response and branching evolution. Our findings reveal the myeloma genome to be heterogeneous across patients and, within individual patients, to exhibit diversity in clonal admixture and dynamics in response to therapy. Illumina Genome Analyzer II;, Illumina HiSeq 2000; 154 bam,srf
EGAD00001001107 MMP-seq cell lines (FASTQ) Illumina Genome Analyzer IIx; 154 fastq
EGAD00001003467 This dataset contains 77 tumor-normal pairs of exome sequencing data of HCC patient from National Taiwan University, Taiwan. Illumina HiSeq 2500;ILLUMINA 154
EGAD00000000042 GenomEUtwin Finnish (FIN) samples Illumina HumanHap 300 153
EGAD00001000654 DATA FILES FOR BALL-PAX5 Illumina HiSeq 2000; 153 bam
EGAD00010001032 RNA Expression using Illumina HT12 v3 Illlumina HT12 v3 153
EGAD00001000117 Myelodysplastic Syndrome Exome Sequencing Illumina HiSeq 2000, Illumina Genome Analyzer II, Illumina HiSeq 2000; 152 bam,srf
EGAD00010000458 Controls using 450K DNA methylation 151
EGAD00001000217 UK10K_RARE_CILIOPATHIES REL-2012-07-05 Illumina Genome Analyzer II;, Illumina HiSeq 2000; 150 vcf
EGAD00001001092 Approximately 80% of clinically clearly diagnosed patients suffering from primary ciliary dyskinesia (PCD) cannot be assigned to a specific gene defect. Despite extensive research on PCD and despite the increasing number of PCD genes and knowledge about their sites of action as e.g structural component or cytoplasmic pre-assembly factor, the biology of motile cilia and the pathomechanism leading to PCD is largely unknown. The aim of this study is to identify novel PCD related genes and processes relevant for motile cilia function. We will perform exome sequencing, aiming on the analysis of family trios. In these families, the diagnosis of PCD is secured, but the underlying gene defects has so far not been identified. Illumina HiSeq 2000; 150 cram
EGAD00001003188 Variants and genotypes called in 50 danish parent-offspring trios from 80x Illumina sequencing data using BayesTyper. Data was produced using different insert size libraries of the sizes 180, 500, 800, 2000, 5000, 10000 and 20000 bp. The sample IDs for the fathers and mothers are TrioID-01 and TrioID-02, respectively, and the IDs for the children are TrioID-0x, where x is a number between 3 and 7 150
EGAD00001003157 Alignment of Genome Denmark Phase II dataset to GRCh38. The dataset consists of 150 Danish individuals (50 trios) sequenced to 80X. The BAM-file contains data from multiple libraries created from one individual with libraries of 180, 500, 800, 2000, 5000, 10000 and 20000 bp. The libraries were created using standard Illumina protocols for paired end reads (180-800bp libraries) and mate pair libraries (2kb-20kb). 150
EGAD00001003435 Whole Genome Sequencing for the paper titled "Orthotopic Patient-Derived Xenografts of Pediatric Solid Tumors" Illumina HiSeq 2000;ILLUMINA 150
EGAD00001001850 Genomic DNA from Swedish control individuals was pooled. Then the genomic sequence of brain expressed miRNA genes was determined in the pools. Illumina MiSeq; 149 fastq
EGAD00001001425 The objectives of this project are the identification of markers related to cancer therapy resistance in the blood of breast cancer patients and to study the genetic changes in cancer cells during this development of resistance. Whole genome amplified DNA from Circulating Tumor Cells (CTCs), selected during the course of systemic treatment from blood of metastatic breast cancer patients, will be exome sequenced . The patients selected for this study did not respond to therapy. Illumina HiSeq 2000; 149 cram
EGAD00001003390 DCM-cases (149 human DCM samples) human heart biopsies from 149 patients with dilated cardiomyopathy (DCM) were subjected to RNA sequencing in order to assess transcriptome variation. We used Illumina HiSeq2000 technology. Each sample-dataset contains the output from tophat-1.4.1 (one *.bam file with the aligned reads and two *.fq files one with the not aligned forward read and one with the revers unaligned reads). We reveal extensive differences of gene expression and splicing between dilated cardiomyopathy patients and controls. Illumina HiSeq 2000;ILLUMINA 149
EGAD00010001406 Breast cancer tissue and controls Exiqon 7th generation miRCURY LNA microRNA microarray system 149
EGAD00001003434 Whole Exome Sequencing for the paper titled "Orthotopic Patient-Derived Xenografts of Pediatric Solid Tumors" Illumina HiSeq 2000;ILLUMINA 149
EGAD00001000807 Whole Exome Sequencing (WES) for St. Jude High Grade Glioma (HGG) study Illumina HiSeq 2000; 148 bam
EGAD00001002125 ICGC PCAWG Dataset for WGS BAM aligned using BWA MEM. Project: BOCA-UK. 148 readme_file,bai,bam
EGAD00010000853 VeraCode GoldenGate GT Assay technology 147
EGAD00001000273 This Study uses a focused bespoke bait pull down library method to target findings of Meningioma whole genome and whole exome sequencing studies in order to validate findings. This method will also be used on a larger set of tumour only samples in order to find precedence of these findings in a larger set of patient samples. Illumina HiSeq 2000; 147 bam
EGAD00001002225 This study involves targeted sequencing of samples from myeloid malignancies at different timepoints to assess clonal evolution of malignancy a. This data is part of a pre-publication release. For information on the proper use of pre-publication data shared by the Wellcome Trust Sanger Institute (including details of any publication moratoria), please see http://www.sanger.ac.uk/datasharing/ Illumina MiSeq; 147 cram
EGAD00001001398 We sequenced 205 patients who were suffering NSCLC with Exome sequencing method. Illumina Genome Analyzer II;, Illumina HiSeq 2000; 147 fastq
EGAD00001003227 ICGC PCAWG Dataset for WGS BAM aligned using BWA MEM. Project: OV-AU. 146
EGAD00010001300 Medulloblastoma expression profiling Affymetrix expression array 146
EGAD00001000216 Exome capture sequencing of colon tumor/normal pairs Illumina HiSeq 2000; 144 fastq
EGAD00010000212 Normalized expression data; normals Illumina HT 12 144
EGAD00010000520 Healthy volunteer collection of European Ancestry Illumina OmniExpress v1.0-Illumina GenomeStudio 144
EGAD00001000980 This study involves a forward genetic screen to identify common insertion sites in drug resistant clones. We will be utilising piggybac transposon systems in order to generate multiple drug resistant clones in a range of human cancer cell lines. Illumina MiSeq; 144 bam,cram
EGAD00010000642 CLL Expression Array 144
EGAD00001001909 Paired-end whole exome sequenncing (Illumina) of primary enucleated retinoblastoma and matching lymphocyte DNA was performed to find somatic alterations that are related to oncogenesis. Illumina HiSeq 2500; 143 fastq
EGAD00001001628 Illumina MiSeq;, Illumina HiSeq 2500; 142 bam
EGAD00001001462 Exome sequencing of 142 samples with corresponding Sanger sequencing results for 409 variants and 321 negative sites. DNA library preps prepared with Illumina TruSeq sample preparation kit. The captured DNA libraries were PCR amplified using the supplied paired-end PCR primers. Sequencing was performed with an Illumina HiSeq2000 (SBS Kit v3, one pool per lane) generating 2x101-bp reads. Illumina HiSeq 2000;, Illumina HiSeq 2500; 142 fastq
EGAD00001001424 We obtained paired longitudinal specimens from a total of 38 glioblastoma (GBM) patients (34 primary and 4 secondary GBM patients). Treatment-naive initial tumors were available for 35 cases; for the other 3 cases, we used the first available recurrent tumors in lieu of initial tumors. Tumor specimens were subjected to whole-exome sequencing (27 of 38 cases, with the matched normal/blood for 22 of the 27 cases) and transcriptome sequencing (30 of 38 cases). Illumina HiSeq 2000;, Illumina HiSeq 2500; 141 bam
EGAD00001001926 Esophageal Squamous Cell Carcinoma (ESCC) is one of the deadliest cancers worldwide. We performed 71 Whole-exome sequencing of Esophageal Squamous Cell Carcinoma on Chinese Patients. Illumina HiSeq 2000; 141 fastq
EGAD00001003237 Primary mucosal melanomas (MMs) arise from melanocytes located in mucosal membranes lining the respiratory, gastrointestinal and urogenital tracts. MMs frequently present late and have a poor prognosis; the 5-year survival rate is only 14%. MM makes up only ~1.4% of all melanomas and it is this rarity that makes knowledge of the genetic changes that contribute to its pathogenesis limited to a small number of exome/genome studies and other targeted studies. Thus to investigate the somatic alterations and mutation spectra in MM genomes, we have extracted genomic DNA from formalin-fixed, paraffin-embedded (FFPE) human MMs, and subjected them to whole exome sequencing. Given the propensity of MM to metastasize, we will also be sequencing metastatic MM lesions; primary and metastatic lesions from the same individual represent an excellent opportunity to identify potential drivers of metastasis in MM. Finally we will sequence 'normal' DNA from the same individual, where possible, to exclude germline variations. Illumina HiSeq 2000;ILLUMINA 141
EGAD00001000418 UK10K_RARE_NEUROMUSCULAR REL-2013-04-20 Illumina HiSeq 2000; 140 vcf
EGAD00001001073 miRNA-seq Cohort of 140 Formalin Fixed Paraffin Embedded Diffuse Large B-cell Lymphoma Patient Samples 140 bam
EGAD00001002157 ICGC PCAWG Dataset for WGS BAM aligned using BWA MEM. Project: MELA-AU. 140 readme_file,bai,bam
EGAD00001002190 Single-end BAM files of the targeted deep sequencing analysis of several mtDNA candidate regions in blood and buccal-derived DNA of the corresponding twin pairs. Illumina MiSeq; 140 bam
EGAD00001000215 RNA sequencing of colon tumor/normal sample pairs Illumina HiSeq 2000; 139 fastq
EGAD00001001022 nccRCC RNA-Seq data of consented samples Illumina HiSeq 2500; 139 fastq
EGAD00001003309 The study will investigate serial samples from the same patient taken at the time of MGUS or SMM diagnosis, and later at the time of evolution towards MM. Samples will be sequenced by whole genome along with a matched normal to obtain the highest possible amount of information toinvestigate genomic changes at disease evolution. This dataset contains all the data available for this study on 2017-04-27. HiSeq X Ten;ILLUMINA 139
EGAD00001001316 Exome sequence analysis of individuals with severe early onset inflammatory bowel disease, and their families. Individuals are ascertained through the COLORS in IBD study, which includes centres throughout UK and Europe. Illumina HiSeq 2000; 138 cram
EGAD00001001320 This is a study to test ATAC-seq protocols. CD4+ and CD8+ cells have been obtained from three different anatomical compartments. We aim to assay open-chromatin regions across these cells and perform comparative analyses. This data is part of a pre-publication release. For information on the proper use of pre-publication data shared by the Wellcome Trust Sanger Institute (including details of any publication moratoria), please see http://www.sanger.ac.uk/datasharing/ Illumina MiSeq; 138 cram
EGAD00001002187 To identify transcriptome profile in this unique subset of gastric cancers, RNA-seq analyses were performed using frozen cancer tissue. Adjacent normal tissue of the same patients were used in differently expressed gene selection and fusion gene prediction. Illumina HiSeq 2500; 138 bam
EGAD00001002110 Chronic lymphocytic leukemia (CLL) is characterized by substantial clinical heterogeneity, despite relatively few genetic alterations. To provide a basis for studying epigenome deregulation in CLL, we established genome-wide chromatin accessibility maps for 88 CLL samples from 55 patients using the ATAC-seq assay, and we also performed ChIPmentation and RNA-seq profiling for ten representative samples. Based on the resulting dataset, we devised and applied a bioinformatic method that links chromatin profiles to clinical annotations. Our analysis identified sample-specific variation on top of a shared core of CLL regulatory regions. IGHV mutation status – which distinguishes the two major subtypes of CLL – was accurately predicted by the chromatin profiles, and gene regulatory networks inferred for IGHV-mutated vs. IGHV-unmutated samples identified characteristic differences between these two disease subtypes. In summary, we discovered widespread heterogeneity in the chromatin landscape of CLL, established a community resource for studying epigenome deregulation in leukemia, and demonstrated the feasibility of chromatin accessibility mapping in cancer cohorts and clinical research. Illumina HiSeq 3000; 138 bam
EGAD00010001153 Family Trios on aCGH 8x60K Agilent 8x60K 138
EGAD00010000252 CLL Expression Arrays Affymetrix U219 137
EGAD00001001023 nccRCC Whole Exome sequencing data (consented samples only) Illumina HiSeq 2500; 137 fastq
EGAD00001003284 Whole exome sequencing of enteropathy-associated T cell lymphoma (EATL) tumors and paired normals, as well as RNA-sequencing of EATL tumors: including (1) 69 exome capture, paired-end Illumina Hiseq sequencing, BAM files from EATL tumor samples, (2) 36 exome capture, paired-end Illumina Hiseq sequencing, BAM files from EATL paired normal samples, and (3) 32 RNAseq, paired-end Illumina Hiseq sequencing, BAM files from EATL tumor samples. Illumina HiSeq 2500;ILLUMINA 137
EGAD00001001437 HipSci - Healthy Normals - Exome Sequencing - April 2015 Illumina HiSeq 2000; 136 cram,tabix,bai,vcf,bam
EGAD00001002709 ATAC-seq data for 136 sample(s) from venous blood, on Genome GRCh38. 141 run(s), 139 experiment(s), 139 alignment(s). Part of BLUEPRINT (September 2016). NextSeq 500;ILLUMINA 136 bam,fastq
EGAD00001003825 Illumina MiSeq;ILLUMINA, Illumina HiSeq 2000;ILLUMINA 134
EGAD00001002711 ChIP-Seq_H3K4me3 data for 133 mature neutrophil sample(s). 208 run(s), 136 experiment(s), 136 analysis(s) on human genome GRCh37. Analysis documentation available at http://ftp.ebi.ac.uk/pub/databases/blueprint/blueprint_Epivar/protocols/README_chipseq_analysis_ebi_20160816 Illumina HiSeq 2000;ILLUMINA 133 fastq,cram,bam
EGAD00010000908 Illumina SNP-arrays for matching retinoblastoma-blood pairs and retinoblastoma cell lines. HumanOmni1 Quad BeadChip 132
EGAD00010001403 Gene expression read counts Illumina HiSeq2000 132
EGAD00001000974 High-grade serous ovarian cancer (HGSC) is characterized by poor outcome, often attributed to emergence of treatment-resistant sub-clones. We sought to measure the degree of genomic diversity within primary, untreated HGSC to examine the natural state of tumor evolution prior to therapy. We performed exome sequencing, copy number analysis, targeted amplicon deep sequencing and gene expression profiling on thirty-one spatially and temporally separated HGSC tumor specimens (six patients) including ovarian masses, distant metastases, and fallopian tube lesions. We found widespread intra-tumoral variation in mutation, copy number, and gene expression profiles, with key driver alterations in genes present in only a subset of samples (e.g. PIK3CA, CTNNB1, NF1). On average, only 51.5% of mutations were present in every sample of a given case (range: 10.2% to 91.4%), with TP53 as the only somatic mutation consistently present in all samples. Complex segmental aneuploidies, such as whole genome doubling, were present in a subset of samples from the same individual, with divergent copy number changes segregating independently of point mutation acquisition. Reconstruction of evolutionary histories showed one patient with mixed HGSC and endometrioid histology with common etiologic origin in the fallopian tube and subsequent selection of different driver mutations in the histologically distinct samples. In this patient, we observed mixed cell populations in the early fallopian tube lesion, indicating diversity arises at early stages of tumorigenesis. Our results reveal that HGSC exhibit highly individual evolutionary trajectories and diverse genomic tapestries prior to therapy, exposing an essential biological characteristic to inform future design of personalized therapeutic solutions and investigation of drug resistance mechanisms. Illumina MiSeq;, Illumina HiSeq 2000; 131 bam
EGAD00001001898 The study will investigate serial samples from the same patient taken at the time of MGUS or SMM diagnosis, and later at the time of evolution towards MM. Samples will be sequenced by whole genome along with a matched normal to obtain the highest possible amount of information toinvestigate genomic changes at disease evolution. This dataset contains all the data available for this study on 2016-01-27. HiSeq X Ten; 131 cram
EGAD00001001438 HipSci - Healthy Normals - RNA Sequencing - May 2015 Illumina HiSeq 2000; 131 cram,bam,bai
EGAD00001002712 ChIP-Seq_H3K27me3 data for 131 mature neutrophil sample(s). 321 run(s), 134 experiment(s), 134 analysis(s) on human genome GRCh37. Analysis documentation available at http://ftp.ebi.ac.uk/pub/databases/blueprint/blueprint_Epivar/protocols/README_chipseq_analysis_ebi_20160816 Illumina HiSeq 2000;ILLUMINA 131 fastq,cram,bam
EGAD00001000298 UK10K_RARE_NEUROMUSCULAR REL-2012-11-27 Illumina HiSeq 2000; 130 vcf
EGAD00001000105 MuTHER adipose tissue small RNA expression Illumina Genome Analyzer II 130 bam
EGAD00001000200 Dilgom Exome Illumina HiSeq 2000; 130 bam
EGAD00010000552 Neuroblastoma samples 130
EGAD00001001027 The offspring of first cousin marriages have ~6% of their genome autozygous, i.e. homozygous identical by descent, or even more if there was further consanguinity in their ancestry. In the UK there are large populations with very high first cousin marriage rates of 20-50%. Sequencing the exomes of a sample of these individuals has the potential both to support genetic health programmes in these populations, and to provide genetic research information about rare loss of function mutations. This pilot study based on existing British-Pakistani cohort samples will identify homozygous individuals for almost all variants down to an allele frequency around 1%, plus individuals carrying hundreds of new homozygous rare loss-of-function variants, and will support development of community relations and ethics for a wider study currently being designed. The data deposited in the EGA consists of low coverage whole exome sequencing on these samples. Illumina HiSeq 2000; 130 cram
EGAD00001000975 65 prostate cancer cases transcriptome sequencing Illumina HiSeq 2000; 130
EGAD00001001004 65 prostate cancer cases wgs sequencing Illumina HiSeq 2000; 130
EGAD00001003547 ICGC PCAWG Dataset for RNA-Seq BAM aligned using Star. Project: LIRI-JP. 130
EGAD00001003546 ICGC PCAWG Dataset for RNA-Seq BAM aligned using TopHat2. Project: LIRI-JP. 130
EGAD00001000191 UK10K_RARE_CILIOPATHIES REL-2012-02-22 Illumina Genome Analyzer II;, Illumina HiSeq 2000; 128 vcf
EGAD00001000615 UK10K_NEURO_FSZ REL-2013-04-20 Illumina HiSeq 2000; 128 vcf
EGAD00001000291 Exome sequencing identifies mutation of the ribosome in T-cell acute lymphoblastic leukemia Illumina HiSeq 2000; 128 bam
EGAD00001000781 Whole genome, high coverage, sequencing of 128 Ashkenazi Jewish controls 128 vcf
EGAD00001001844 Whole genome sequencing of 64 HER2-Positive Breast Cancer Illumina HiSeq 2000; 128 bam
EGAD00010000946 Human samples, 450k analysis Illumina 450k 127
EGAD00010001099 Digital images of ovarian cancer metastases Aperio 127
EGAD00001003351 In order to comprehensively investigate the genetic relationship between PTC tumors and benign nodules, we totally collected 127 fresh-frozen biopsies samples from 28 patients with concurrent thyroid benign nodule and PTC (n=20) or simple benign nodule (n=8). We carried out whole-exome sequencing on all the 127 biopsies samples and RNA-sequencing in total of 40 samples. Illumina HiSeq 2500;ILLUMINA 127
EGAD00010000960 Definite and borderline rheumatic heart disease cases and patients with mild non-diagnostic valvulopathy recruited in Samoa HumanCore-24 BeadChip 126
EGAD00010001102 Genotype data from Chad, Lebanon, and Yemen Illumina HumanOmni2.5-8 v1.2 A 126
EGAD00001000101 ADCC Exome Sequencing Illumina Genome Analyzer II, Illumina HiSeq 2000; 125 bam
EGAD00001000417 UK10K_RARE_HYPERCHOL REL-2013-04-20 Illumina Genome Analyzer II;, Illumina HiSeq 2000; 125 vcf
EGAD00001000413 UK10K_RARE_CHD REL-2013-04-20 Illumina Genome Analyzer II;, Illumina HiSeq 2000; 125 vcf
EGAD00001001314 Sequence data from L1-amplicon libraries prepared from plasma-DNA from a set of 24 female controls and 18 male controls without malignant disease and samples from patients breast (n= 28) and prostate cancer patients (n=61). Illumina MiSeq; 125 fastq
EGAD00001001016 DATA FILES FOR SJPhLike-RNASeq Illumina HiSeq 2000; 125 bam
EGAD00001000946 Divergent clonal selection dominates medulloblastoma at recurrence 125 bam
EGAD00001001421 Clinical Implications of Genomic Alterations in the Tumour and Circulation of Pancreatic Cancer Patients Illumina MiSeq; 125 fastq
EGAD00001001644 MicroRNAs (miRs) have been recognized as promising biomarkers. It is unknown to what extent tumor-derived miRs are differentially expressed between primary colorectal cancers (pCRCs) and metastatic lesions, and to what extent the expression profiles of tumor tissue differ from the surrounding normal tissue. Next-generation sequencing (NGS) of 220 fresh-frozen samples, including paired primary and metastatic tumor tissue and non-tumorous tissue from 38 patients, revealed expression of 2245 known unique mature miRs and 515 novel candidate miRs. Unsupervised clustering of miR expression profiles of pCRC tissue with paired metastases did not separate the two entities, whereas unsupervised clustering of miR expression profiles of pCRC with normal colorectal mucosa demonstrated complete separation of the tumor samples from their paired normal mucosa. Two hundred and twenty-two miRs differentiated both pCRC and metastases from normal tissue samples (false discovery rate (FDR) <0.05). The highest expressed tumor-specific miRs were miR-21 and miR-92a, both previously described to be involved in CRC with potential as circulating biomarker for early detection. Only eight miRs, 0.5% of the analysed miR transcriptome, were differentially expressed between pCRC and the corresponding metastases (FDR <0.1), consisting of five known miRs (miR-320b, miR-320d, miR-3117, miR-1246 and miR-663b) and three novel candidate miRs (chr 1-2552-5p, chr 8-20656-5p and chr 10-25333-3p). These results indicate that previously unrecognized candidate miRs expressed in advanced CRC were identified using NGS. In addition, miR expression profiles of pCRC and metastatic lesions are highly comparable and may be of similar predictive value for prognosis or response to treatment in patients with advanced CRC. Illumina HiSeq 2000; 125 fastq
EGAD00010000942 Breast lesions assayed with Affymetrix SNP 6.0 Affymetrix SNP 6.0 125
EGAD00001003596 The MITOEXME project aims to improve protocols for molecular diagnosis of patients with OXPHOS disorders with a focus on a next generation sequencing methods and to increase the knowledge of pahtophysiological mechanisms by identification of new targets and cellular studies. In this project we will sequence the exomes fo 120 patients. This dataset contains all the data available for this study on 2017-08-29. Illumina HiSeq 2000;ILLUMINA 125
EGAD00001000294 UK10K_RARE_CHD REL-2012-11-27 Illumina Genome Analyzer II;, Illumina HiSeq 2000; 124 vcf
EGAD00001000210 UK10K_RARE_CHD REL-2012-07-05 Illumina Genome Analyzer II;, Illumina HiSeq 2000; 124 vcf
EGAD00001000297 UK10K_RARE_FIND REL-2012-11-27 Illumina Genome Analyzer II;, Illumina HiSeq 2000; 124 vcf
EGAD00001000416 UK10K_RARE_FIND REL-2013-04-20 Illumina Genome Analyzer II;, Illumina HiSeq 2000; 124 vcf
EGAD00001000420 UK10K_RARE_THYROID REL-2013-04-20 Illumina Genome Analyzer II;, Illumina HiSeq 2000; 124 vcf
EGAD00001003278 Whole Exome and Target Sequencing Data in 75 Samples from 5 Hepatocellular Carcinoma Patients. The sequencing was performed by Illumina HiSeq 4000. Background and aims: Intratumoral heterogeneity (ITH) challenges identifying mutations with target therapy potential whereas circulating cell-free DNAs (cfDNAs) could reflect nearly the entire mutation spectrum in given tumors. We investigated how to minimize the limit of ITH for profiling hepatocellular carcinoma (HCC).Methods: Thirty-two multi-regional HCC samples from five patients were subjected to whole exome sequencing (WES) and targeted deep sequencing (TDS). ITH extent was measured by the average percentage of non-ubiquitous mutations (present in parts of tumor regions). Matched cfDNAs were also analyzed by WES and TDS. Profiling efficiencies of single tumor specimen and cfDNA were compared and the one better depicted mutational landscape was selected to screen therapeutic targets.Results: We found variable extents of ITH in HCCs and observed branched and parallel evolution patterns. ITH level decreased at higher sequencing depth of TDS than that measured by WES (28.1% vs 34.9%, P < 0.01) but it remained unchanged upon additional samples analyzed. TDS of single tumor specimen detected an average of 70% the total mutations in HCC. Although more mutations were detected in cfDNA under TDS than WES, an average of 47.2% total HCC mutations uncovered by cfDNA suggested tissue outperform cfDNA and the latter may serve as alternative in profiling HCC genome. Consequently, TDS of single tumor tissue in 66 patients and cfDNAs in four unresectable HCCs identified 38.6% (26/66 and 1/4) patients bearing therapeutic targets.Conclusions: TDS of single tumor specimen could largely circumvent ITH to uncover mutations indicative of target therapy in HCC. Illumina HiSeq 4000;ILLUMINA 124
EGAD00010001221 Illumina Omni 2.5M SNPchip data (build37) of Ethiopian samples from the Pagani et al. 2015 AJHG paper (doi: http://dx.doi.org/10.1016/j. ajhg.2015.04.019) Illumina HumanOmni2-5_8v1_A 124
EGAD00001000206 UK10K_RARE_COLOBOMA REL-2012-07-05 Illumina Genome Analyzer II;, Illumina HiSeq 2000; 123 vcf
EGAD00001000415 UK10K_RARE_COLOBOMA REL-2013-04-20 Illumina Genome Analyzer II;, Illumina HiSeq 2000; 123 vcf
EGAD00001003145 Sensory neurons are nerve cells that are activated by sensory input such as heat, light and convey information to the brain. Although a key cell type in complex organisms, human sensory neurons are challenging to study because they are impossible to obtain from living donors. We have collaborated with the Neucentis Pharmaceutical Research Unit to differentiate sensory neuron like cells from human induced pluripotent stem cells derived as part of the Human Induced Pluripotent Stem Cells Initiative. We will sequence RNA from 100 IPS lines derived from healthy individuals and perform RNA-seq on the differentiated cells to identify noncoding variants that alter gene expression in human sensory neurons. Illumina MiSeq;ILLUMINA, Illumina HiSeq 2000;ILLUMINA 123
EGAD00001000414 UK10K_RARE_CILIOPATHIES REL-2013-04-20 Illumina Genome Analyzer II;, Illumina HiSeq 2000; 122 vcf
EGAD00001002716 In this study we characterized genomic alterations in two to five metachronous bladder tumors from 29 patients initially diagnosed with early stage disease. Fourteen patients (32 tumors) had non progressive disease (NPD) and 15 patients (34 tumors) had progressive disease (PD). Whole exome sequencing (WES, ~50x mean read depth and whole transcriptome RNA-seq was performed (RNA was not advalible for 4 tumors) Data provided here consist of 122 Bam files for WES (83 Tumors and 39 blood) Illumina HiSeq 2000;ILLUMINA 122 bam
EGAD00001000209 UK10K_RARE_FIND REL-2012-07-05 Illumina Genome Analyzer II;, Illumina HiSeq 2000; 121 vcf
EGAD00001000419 UK10K_RARE_SIR REL-2013-04-20 Illumina Genome Analyzer II;, Illumina HiSeq 2000; 121 vcf
EGAD00001000184 UK10K_NEURO_FSZ_REL_2012_01_13 Illumina HiSeq 2000; 120 vcf
EGAD00001000240 UK10K_NEURO_FSZ_REL_2012_07_05 Illumina HiSeq 2000; 120 vcf
EGAD00001000295 UK10K_RARE_HYPERCHOL REL-2012-11-27 Illumina Genome Analyzer II;, Illumina HiSeq 2000; 120 vcf
EGAD00001000598 The Ethiopian area stands among the most ancient ones ever occupied by human populations and their ancestors. Particularly, according to archaeological evidences, it is possible to trace back the presence of Hominids up to at least 3 million years ago. Furthermore, the present day human populations show a great cultural, linguistic and historic diversity which makes them essential candidate to investigate a considerable part of the African variability. Following the typing of 300 Ethiopian samples on Illumina Omni 1M (see Human Variability in Ethiopia project, previously approved by the Genotyping committee) we now have a clearer idea on which populations living in the area include the most of the diversity. This project therefore aims to sequence the whole genome of 300 individuals at low (4-8x) depth belonging to the six most representative populations of the Ethiopian area to produce a unique catalogue of variants peculiar of the North East Africa. Furthermore 6 samples (one from each population) will also be sequenced at high (30x) depth to ensure full coverage of the diversity spectrum. The retrieved variants will be of great help in evaluating the demographic dynamics of those populations as well as shedding light on the migrations out of Africa. Illumina HiSeq 2000; 120 bam
EGAD00010000566 HipSci - Healthy Normals - Genotyping Array - May 2014 120
EGAD00010000564 HipSci - Healthy Normals - Expression Array - May 2014 120
EGAD00010000614 40 Druze Trios 120
EGAD00010001005 Illumina HumanCoreExome-12v1-1_A chip typing in a Greek adolescent population Illumina Human Core Exome 12v1.1 120
EGAD00001003562 This dataset includes bam files from 120 samples. These samples were sequenced using 2x150bp reads on an Illumina HiSeqX sequencer and aligned using the Isaac aligner. All samples were processed with TruSeq DNA PCR-free sample preparation. HiSeq X Ten;ILLUMINA 120
EGAD00001000318 UK10K_NEURO_FSZ REL-2012-11-27 Illumina HiSeq 2000; 119 vcf
EGAD00001000123 Polycythemia Vera Myeloproliferative Disease exome sequencing Illumina HiSeq 2000, Illumina Genome Analyzer II, Illumina HiSeq 2000; 119 bam,srf
EGAD00010000636 WTCCC2 Visceral Leishmaniasis samples from Brazil using Illumina 670k 119
EGAD00010000478 blood-based gene expression from breast cancer cases and age-matched controls in case-control serie 3 (CC3) Illumina 118
EGAD00001001601 The intersection of genome-wide association analyses with physiological and functional data indicates that variants regulating islet gene transcription influence type 2 diabetes (T2D) predisposition and glucose homeostasis. However, the specific genes through which these regulatory variants act remain poorly characterized. To identify such effector transcripts for T2D and glycemic traits, we generated expression quantitative trait locus (eQTL) data in 118 human islet samples using RNA-sequencing and high-density genotyping. Illumina HiSeq 2000; 118 vcf,bam
EGAD00001002896 Amplicon sequencing libraries from the study "Histological Transformation and Progression in Follicular Lymphoma: a Clonal Evolution Study". These are Illumina amplicon deep sequencing libraries (n = 118) to validate somatic predictions made in the whole genome sequencing libraries. Specifically, there are 72 tumor libraries and 46 normal libraries. Some patients may have multiple amplicon libraries sequenced. Illumina HiSeq 2000;ILLUMINA 118
EGAD00001003268 HGSC cases in the OvCaRe and CRCHUM Tumour Banks were selected according to the following criteria: (i) were administered platinum taxane based therapy; (ii) relapsed within 12 months (365 days) or had at least longer than 4.5 years (1642.5 days) follow-up data; (iii) had at least 50% tumour content by H&E staining and expert pathology review. All cases were re-reviewed by expert pathologists to confirm the diagnosis of HGSC. Germline BRCA1 and BRCA2 was determined for all patients through hereditary cancer screening programs. The design of cases selection as a discovery cohort was engineered to amplify biological differences by selecting cases from the extremes of the outcome distribution. All HGSC tumours are primary tumour samples. Library construction and sequencing Frozen specimens with >50% tumour cellularity (based on initial slide review) were used for cryosectioning and subsequent nucleic acid extraction. Patient tumour and normal blood samples derived from primary, untreated fresh frozen tumour specimens harvested at diagnosis during standard of care debulking surgery. Germline DNA was provided from peripheral blood buffy coat on all specimens except 13 from Tokyo, where non-cancer frozen tissue was used as a germline source. DNA extraction from both matched normal (blood) and tumour samples (frozen tissue) were performed using the QIAamp Blood and Tissue DNA kit (Qiagen) and quantified using a Qbit fluorometer and reagents (high-sensitivity assay). Three lanes of Illumina HiSeq 2500 v4 chemistry for normal samples and five lanes for tumour samples were obtained. The PCR-free protocol was adopted to eliminate the PCR-induced bias and improve coverage across the genome. Illumina HiSeq 2000;ILLUMINA 118
EGAD00001001988 Cholangiocarcinoma whole genome sequencing data HiSeq X Ten (ILLUMINA), Illumina HiSeq 2000 (ILLUMINA), Illumina HiSeq 2500 (ILLUMINA) 118
EGAD00001000307 UK10K_RARE_COLOBOMA REL-2012-11-27 Illumina Genome Analyzer II;, Illumina HiSeq 2000; 117 vcf
EGAD00001000219 UK10K_RARE_NEUROMUSCULAR REL-2012-07-05 Illumina HiSeq 2000; 117 vcf
EGAD00001000636 The ETV6-RUNX1 fusion gene, found in 25% of childhood acute lymphoblastic leukemia (ALL), is acquired in utero but requires additional somatic mutations for overt leukemia. We used exome and low-coverage whole-genome sequencing to characterize the critical secondary events associated with leukemic transformation. RAG-mediated deletions emerge as the dominant mutational process, accounting for at least 43% of genomic rearrangements and characterized by the presence of recombination signal sequence motifs near the breakpoints; incorporation of non-templated sequence at the junction and a ten-fold enrichment at promoters and enhancers of genes actively transcribed in early B-lineage development. Single-cell tracking shows that this mechanism is not restricted to one founder cell but is rather active throughout leukemic evolution. Integration of point mutation and rearrangement data identifies recurrent inactivation of ATF7IP and MGA as two new tumor suppressor genes.Thus, a remarkably parsimonious mutational process transforms ETV6-RUNX1 lymphoblasts, striking promoters and enhancers of the genes that normally control B-cell differentiation. Illumina Genome Analyzer II; 117 bam
EGAD00010000736 AAD case and control samples from UK and Norway 117
EGAD00001002064 Zhong Shan Hospital liver tumor single cell sequencing: 111 single cell and 6 tissues Illumina HiSeq 2500; 117 fastq
EGAD00001000059 Screening for human epigenetic variation at CpG islands Illumina Genome Analyzer II 116 bam
EGAD00001002126 ICGC PCAWG Dataset for WGS BAM aligned using BWA MEM. Project: PRAD-UK. 116 bam,bai,readme_file
EGAD00010000562 Medulloblastoma DNA methylation Illumina_HumanMethylation450 115
EGAD00001002006 Whole genome sequencing of paediatric glioblastoma in the ICGC PedBrain project Illumina HiSeq 2500; 115 fastq
EGAD00001000107 SCAT osteosarcoma sequencing Illumina HiSeq 2000, Illumina Genome Analyzer II 114 bam
EGAD00001000239 EGAD00001000239_UK10K_NEURO_IMGSAC_REL_2012_07_05 Illumina HiSeq 2000; 114 vcf
EGAD00001000282 Neuroblastomas are tumors of peripheral sympathetic neurons and are the most common solid tumor in children. To determine the genetic basis for neuroblastoma we performed whole-genome sequencing (6 cases), exome sequencing (16 cases), genome-wide rearrangement analyses (32 cases), and targeted analyses of specific genomic loci (40 cases) using massively parallel sequencing. On average each tumor had 19 somatic alterations in coding genes (range, 3-70). Among genes not previously known to be involved in neuroblastoma, chromosomal deletions and sequence alterations of chromatin remodeling genes, ARID1A and ARID1B, were identified in 8 of 71 tumors (11%) and were associated with early treatment failure and decreased survival. Using tumor-specific structural alterations, we developed an approach to identify rearranged DNA fragments in sera, providing personalized biomarkers for minimal residual disease detection and monitoring. These results highlight dysregulation of chromatin remodeling in pediatric tumorigenesis and provide new approaches for the management of neuroblastoma patients. Illumina HiSeq 2000;, Illumina Genome Analyzer IIx; 114 fastq
EGAD00001000626 Exome sequencing data for tumor and matched normal samples of the EGAS00001000495 project. Illumina HiSeq 2000; 114 fastq
EGAD00001003134 DATA FILES FOR GRUBER SJAMLM7 EXOME Illumina HiSeq 2000;ILLUMINA 114
EGAD00001003363 Whole-exome sequencing on Illumina HiSeq2000/2500 of Patient-derived xenograft derived from colorectal cancer primary tumor sample (EPO2_cohort) Illumina HiSeq 2000;ILLUMINA 114
EGAD00001000329 UK10K_RARE_THYROID REL-2012-11-27 Illumina Genome Analyzer II;, Illumina HiSeq 2000; 113 vcf
EGAD00001000441 UK10K_NEURO_IMGSAC REL-2013-04-20 Illumina HiSeq 2000; 113 vcf
EGAD00001002124 ICGC PCAWG Dataset for WGS BAM aligned using BWA MEM. Project: EOPC-DE. 113 readme_file,bai,bam
EGAD00001003391 DCM-controls (113 human non-DCM samples) human heart biopsies from 113 non-diseased controls were subjected to RNA sequencing in order to assess transcriptome variation. We used Illumina HiSeq2000 technology. Each sample-dataset contains the output from tophat-1.4.1 (one *.bam file with the aligned reads and two *.fq files one with the not aligned forward read and one with the revers unaligned reads). We reveal extensive differences of gene expression and splicing between dilated cardiomyopathy patients and controls. Illumina HiSeq 2000;ILLUMINA 113
EGAD00001000280 This experiment is to validate putative somatic substitutions and indels identified in an exome screen of ~50 osteosarcoma tumour/normal pairs. It is the first stage in our ICGC commitment to study osteosarcoma. The validation process is an important component of our analysis to clarify the data prior to looking for evidence of new cancer genes, or subverted pathways important in the development of cancer. This data is part of a pre-publication release. For information on the proper use of pre-publication data shared by the Wellcome Trust Sanger Institute (including details of any publication moratoria), please see http://www.sanger.ac.uk/datasharing/ Illumina HiSeq 2000; 112 bam
EGAD00001001060 Illumina HiSeq 2000; 112 bam
EGAD00001000995 HIPO blastemal Wilms (nephroblastoma) characterisation of tumor driving DNA alterations Illumina HiSeq 2000; 112 fastq
EGAD00001001673 Part of WGS seq data of Maligant Lymphoma study (ICGC) Illumina HiSeq 2000;, Illumina HiSeq 2500; 112 readme_file,fastq
EGAD00001000320 UK10K_NEURO_IMGSAC REL-2012-11-27 Illumina HiSeq 2000; 111 vcf
EGAD00001000334 UK10K_RARE_SIR REL-2012-11-27 Illumina Genome Analyzer II;, Illumina HiSeq 2000; 111 vcf
EGAD00001000266 This Study uses a focused bespoke bait pull down library method to target findings of Osteosarcoma whole genome and whole exome sequencing studies in order to validate findings. This method will also be used on a larger set of tumour only samples in order to find precedence of these findings in a larger set of patient samples. Illumina HiSeq 2000; 110 bam
EGAD00010000476 blood-based gene expression from breast cancer cases and age-matched controls in case-control serie 1 (CC1) Illumina 110
EGAD00001000730 The VBSEQ project aims to combine available extensive genetic and phenotypic data to the latest high-throughput genome sequencing technology and ad hoc statistical analysis to identify new rare genetic variants underlying complex traits. Up to 100 Val Borbera samples will be sequenced to a 6x depth. Illumina HiSeq 2000; 110 bam
EGAD00010001298 primary human ACC and normal samples using 450K Illumina_450K 110
EGAD00001000647 We are sequencing the exomes of patients with paroxysmal neurological disorders mainly focusing on migraine and epilepsy. Cases are collected from performance sites of members of EuroEPINOMICS. Most cases have a strong family history. The study sample will include both cases and controls. Illumina HiSeq 2000; 110 bam,cram
EGAD00001000113 Mutational landscapes of primary triple negative breast cancers - Exomes Illumina Genome Analyzer IIx, Illumina Genome Analyzer IIx; 108 bam
EGAD00001000296 UK10K_RARE_CILIOPATHIES REL-2012-11-27 Illumina Genome Analyzer II;, Illumina HiSeq 2000; 108 vcf
EGAD00001000147 Osteosarcoma Whole Genome Illumina HiSeq 2000, Illumina HiSeq 2000; 108 bam,cram
EGAD00001000385 Wholegenome libraries will be prepared from at least two serial samples reflecting different stages of disease progression and matched constitutional DNA for 30 Myeloproliferative Disease samples. Five lanes of Illumina HiSeq sequencing will be performed on each of the tumour samples and four lanes for each of the constitutional DNA. Sequencing data will mapped to build 37 of the human reference genome and analysis will be performed to characterize the spectrum of somatic variation present in these samples including single base pair mutations, insertions, deletions as well as larger structural variants and genomic rearrangements. Illumina HiSeq 2000; 108 bam,cram
EGAD00010000466 Down syndrome CNV genotyping data NimbleGen 135K aCGH - NimbleScan 108
EGAD00001001630 release_2: ICGC PedBrain: whole genome bisulfite sequencing Illumina HiSeq 2000; 108 readme_file,fastq
EGAD00001002215 Low coverage whole genome sequencing plasma DNA from 50 male, 54 female non-cancer donors. For the analysis of nucleosomal positioning all data from the non-cancer controls were merged. Furthermore, two patients with metastasized breast cancer were sequenced on a NextSeq with higher depth. NextSeq 550;, Illumina MiSeq; 108
EGAD00001000258 Deep RNA sequencing in CLL Illumina Genome Analyzer II; 107 fastq
EGAD00001000679 A bespoke targeted pulldown experiment will be performed on patients with Angiosarcoma. the resulting products will be sequenced to determine the prevalence of previously found mutations in these patients. Illumina HiSeq 2000; 107 bam
EGAD00010001047 APCDR AGV Project: Array data from 107 Ethiopians (Amhara, Oromo, Somali; subset of Ethiopian Genome Project Genotyping). Raw data, intensity files and post-QC Plink files. Illumina HumanOmni2-5_8v1_A 107
EGAD00001000246 Integrative Oncogenomics of multiple myeloma Illumina HiSeq 2000; 106 bam
EGAD00001000810 Dataset for whole exome sequencing of 49 tumor-blood pairs and transcriptome sequencing of 44 tumors for adrenocortical tumors Illumina HiSeq 2000; 106 fastq
EGAD00001000774 This study includes whole-genome sequencing data (at 4x depth) of 100 individuals from an Italian genetic isolate population (Carlantino, abbreviated CARL) of the Italian Network of Genetic Isolates (INGI). The INGI-CARL_SEQ project aims to combine available extensive genetic and phenotypic data to the latest high-throughput genome sequencing technology and ad hoc statistical analysis to identify new rare genetic variants underlying complex traits. Illumina HiSeq 2000; 106 cram
EGAD00001002384 ChIP-Seq data for 106 Chronic Lymphocytic Leukemia sample(s). 173 run(s), 162 experiment(s), 162 analysis(s) on human genome GRCh38. Part of BLUEPRINT release August 2016. Analysis documentation available at http://ftp.ebi.ac.uk/pub/databases/blueprint/releases/20160816/homo_sapiens/README_chipseq_analysis_ebi_20160816 Illumina HiSeq 2000;ILLUMINA 106 bam,fastq
EGAD00001002916 ATAC-seq data for 106 sample(s) Chronic Lymphocytic Leukemia from venous blood, on Genome GRCh38. 111 run(s), 109 experiment(s), 109 alignment(s). Part of BLUEPRINT (September 2016). NextSeq 500;ILLUMINA 106 bam,fastq
EGAD00001003320 Transcriptome sequencing of tumour tissue, adjacent normal tissue and derived organoids/tumoroids from colorectal cancer This dataset contains all the data available for this study on 2017-05-04. Illumina HiSeq 2000;ILLUMINA, Illumina HiSeq 2500;ILLUMINA 106
EGAD00001003357 Aligned, merged and deduplicated BAM files from HiSeq whole exome sequencing of 106 samples: matched tumour-normal pairs from 53 melanoma patients. 106
EGAD00001003585 Genomics-Driven Precision Medicine for Advanced Pancreatic Cancer - Early Results from the COMPASS Trial - WGS mapped reads 106
EGAD00001002188 Paired-end BAM files of mitochondrial whole genome deep sequencing (mtWGDS) analysis Illumina HiSeq 2500; 105 bam
EGAD00001001208 Targeted capture of cancer gene panel bait set in single cell derived organoids from colon tissue and colorectal cancer from 1 patient. Illumina HiSeq 2000;, Illumina HiSeq 2500; 105 cram
EGAD00001003159 Bam files consisting of aligned MeDIP-seq reads from cord blood cells and cord blood mononuclear cells of twins not conceived through in vitro fertilisation Illumina Genome Analyzer II;ILLUMINA 105
EGAD00010001200 Genotyping data from Indonesian sea nomad and surrounding populations Illumina Omni 5 105
EGAD00001000125 Chondrosarcoma Exome Illumina HiSeq 2000, Illumina HiSeq 2000; 104 bam
EGAD00001000052 UK10K_NEURO_MUIR REL-2011-01-28 Illumina Genome Analyzer II; 104 vcf,bam,bam
EGAD00010000148 tumour samples using Affymetrix Genome-Wide SNP6.0 arrays Affymetrix_GenomeWide_SNP6.34 104
EGAD00001001024 Fastq files of 52 samples of hepatocellular carcinoma (RCAST, THCC) Illumina HiSeq 2000; 104 fastq
EGAD00001002248 Total of 49 tumor specimens from 20 patients were subjected for whole-exome and/or whole-transcriptome sequencing including matched normal/blood. Tumor samples are acquired based on 4 categories; 1) locally adjacent tumors, 2) multifocal/multicentric tumors, 3) 5-ALA (+/-) tumors and 4) Longitudinal tumors. Illumina HiSeq 2500; 104
EGAD00001000222 Exome capture sequencing of SCLC tumor/normal pairs and cell lines Illumina HiSeq 2000; 103 fastq
EGAD00001002897 Whole genome sequencing libraries from the study "Histological Transformation and Progression in Follicular Lymphoma: a Clonal Evolution Study". These are libraries from 41 patients. Specifically: 15 transformed follicular lymphoma (TFL), 6 early progressers (PFL), and 20 non-early progressers (NPFL). For TFL and PFL patients, trios consisting of diagnostic (T1), transformed/progressed (T2) and a matching normal are available (n = 63 libaries in total). For NPFL patients, a tumor-normal pair are available (n = 40 libraries). 103
EGAD00001000118 Osteosarcoma Exome Sequencing Illumina Genome Analyzer II, Illumina HiSeq 2000; 102 bam
EGAD00001000714 102 bam
EGAD00001000902 The dataset includes the targeted gene sequencing data from 51 pairs of gallbladder caner tissues and patient-matched normal tissues. Illumina HiSeq 2500; 102 bam
EGAD00001000815 Exome-seq, RNA-Seq, SNP array profiling of gastric tumor samples. Illumina HiSeq 2000; 102 fastq
EGAD00001001452 Anaplastic oligodendrogliomas (AOs) are rare primary brain tumors which are generally incurable, with heterogeneous prognosis and few treatment targets identified. Most oligodendrogliomas have chromosome 1p/19q co-deletion and IDH mutation. We analyzed 51 AOs by whole-exome sequencing, identifying previously reported frequent somatic mutations in CIC and FUBP1. We also identified recurrent mutations in TCF12 and in an additional series of 83 AO. Overall 7.5% of AO are mutated for TCF12, which encodes an oligodendrocyte-related transcription factor. 80% of TCF12 mutations identified were in either the bHLH domain, which is important for TCF12 function as a transcription factor, or were frame shift mutations leading to TCF12 truncated for this domain. We show that these mutations compromise TCF12 transcriptional activity and are associated with a more aggressive tumor type. Our analysis provides further insights into the unique and shared pathways driving AO. Illumina HiSeq 2000; 102 bam
EGAD00001001899 HDAC and PI3K Antagonists Cooperate to Inhibit Growth of MYC-driven Medulloblastoma 102 bam
EGAD00001003392 High-coverage WGS sequencing of DNA samples from 51pairs GCs was performed on the Illumina HiSeq X Ten System. Illumina HiSeq 2000;ILLUMINA 102
EGAD00001002890 Exome sequencing of 96 French-Canadians 102
EGAD00001003751 Whole genome sequencing data for primary tumors, matching control material from blood and their corresponding organoid. Whole transcriptome data for organoids. HiSeq X Ten;ILLUMINA, NextSeq 500;ILLUMINA 102
EGAD00010001452 Genome-wide SNP genotyping data for 102 Pakistani individuals by Illumina HumanOmni2.5-8 array, used in the EGAS00001002558 study Illumina HumanOmni2.5-8 102
EGAD00001000390 We propose to definitively characterise the somatic genetics of triple negative breast cancer through generation of comprehensive catalogues of somatic mutations in breast cancer cases by high coverage genome sequencing coupled with integrated transcriptomic and methylation analyses. Illumina HiSeq 2000; 101 bam,cram
EGAD00001000126 HER2 positive Breast Cancer Illumina HiSeq 2000 101 bam,cram
EGAD00010000486 ccRCC case samples using expression array Agilent Human Whole Genome 4x44k v2 - Feature Extraction 101
EGAD00001001394 Samples from Ross Innes et. al 2015 - doi:10.1038/ng.3357 Illumina HiSeq 2000; 101 bam
EGAD00001001856 100 other
EGAD00001000731 This study includes Phase 2 whole-genome sequencing data (at 4x depth)of 100 individuals from an Italian genetic isolate population (Val Borbera, abbreviated VBI) of the Italian Network of Genetic Isolates (INGI). The INGI-VBI_SEQ2 project aims to combine available extensive genetic and phenotypic data to the latest high-throughput genome sequencing technology and ad hoc statistical analysis to identify new rare genetic variants underlying complex traits. Illumina HiSeq 2000; 100 bam
EGAD00001000842 RIKEN collection WGS reads for 100 HCC and matched blood samples from 50 donors submitted to ICGC for release 16 Illumina HiSeq 2000; 100 fastq
EGAD00001001007 Low depth (4x) Illumina HiSeq raw sequence data for 100 unrelated Zulu from Durban area, South Africa. Illumina HiSeq 2000; 100 bam,cram
EGAD00001001008 Low depth (4x) Illumina HiSeq raw sequence data for 100 unrelated Baganda from rural Uganda. Illumina HiSeq 2000; 100 bam
EGAD00001001372 All humans outside Africa are descendants of the same single exit, usually dated at 50-70 thousand years ago. However, the route taken out of Africa is still debated. The two main candidates are a northern route via Egypt and the Levant, or a southern route via Ethiopia and the Arabian Peninsula. We are generating genetic data to evaluate these two possibilities. In this study we propose to generate low-coverage sequencing data for 100 Egyptian samples. Illumina HiSeq 2000; 100 cram
EGAD00001001273 Whole genome sequencing was performed with DNA extracted from fresh-frozen tumor and normal material. Short insert DNA libraries were prepared with the TruSeq DNA PCRfree sample preparation kit (Illumina) for paired-end sequencing at a minimum read length of 2x100bp. Human DNA libraries were sequenced to an average coverage of minimum 30x for both tumor and matched normal. Murine DNA libraries of tumor and matched normal were both sequenced to a coverage of 25x. Illumina HiSeq 2000; 100 bam
EGAD00001001440 This project entailed generation of high depth WGS (30x) of 100 individuals from the general Greek population. HiSeq X Ten; 100 cram
EGAD00001002256 Corresponding data set is composed of whole exome sequencing of Korean ER positive breast cancer under 35. This set provides 100 alignment files from normal-tumor paired whole exome sequencing of 50 patients. This is a part of total project data set. Illumina HiSeq 2500;ILLUMINA, Illumina HiSeq 2500; 100 bam
EGAD00010001053 APCDR AGV Project: Array data from 100 Banyarwanda. Raw data, intensity files and post-QC Plink files. Illumina HumanOmni2.5-4v1_B and HumanOmni2-5_8v1_A 100
EGAD00010001058 APCDR AGV Project: Array data from 100 Ga-Adangbe. Raw data, intensity files and post-QC Plink files. Illumina HumanOmni2.5-4v1_B and HumanOmni2-5_8v1_A 100
EGAD00010001055 APCDR AGV Project: Array data from 100 Baganda. Raw data, intensity files and post-QC Plink files. Illumina HumanOmni2.5-4v1_B and HumanOmni2-5_8v1_A 100
EGAD00010001056 APCDR AGV Project: Array data from 100 Zulu. Raw data, intensity files and post-QC Plink files. Illumina HumanOmni2.5-4v1_B and HumanOmni2-5_8v1_A 100
EGAD00010001052 APCDR AGV Project: Array data from 100 Kalenjin. Raw data, intensity files and post-QC Plink files. Illumina HumanOmni2.5-4v1_B 100
EGAD00010001223 Illumina Omni 2.5M SNPchip data (build37) of Egyptian samples from the Pagani et al. 2015 AJHG paper (doi: http://dx.doi.org/10.1016/j. ajhg.2015.04.019) Illumina HumanOmni2-5M-8v1-1_B 100
EGAD00001003262 High-coverage WES sequencing of DNA samples from 50 PTCs was performed on the Illumina HiSeq 2500 or 4000 System Illumina HiSeq 2000;ILLUMINA 100
EGAD00001003559 ICGC PCAWG Dataset for RNA-Seq BAM aligned using TopHat2. Project: RECA-EU. 100
EGAD00001003558 ICGC PCAWG Dataset for RNA-Seq BAM aligned using Star. Project: RECA-EU. 100
EGAD00001003762 Whole Exome sequencing of paediatric High Grade Gliomas Illumina HiSeq 2000;ILLUMINA 100
EGAD00010000886 samples using Affymetrix HG_U133_+2 Affymetrix HG_U133_+2 99
EGAD00010001049 APCDR AGV Project: Array data from 99 Kikuyu. Raw data, intensity files and post-QC Plink files. Illumina HumanOmni2.5-4v1_B 99
EGAD00010001045 APCDR AGV Project: Array data from 99 Igbo. Raw data, intensity files and post-QC Plink files. Illumina HumanOmni2.5-4v1_B 99
EGAD00001003098 Low-coverage sequencing data from 99 Lebanese samples Illumina HiSeq 2500;ILLUMINA 99 cram
EGAD00001003560 ICGC PCAWG Dataset for RNA-Seq BAM aligned using Star. Project: MALY-DE. 99
EGAD00001003561 ICGC PCAWG Dataset for RNA-Seq BAM aligned using TopHat2. Project: MALY-DE. 99
EGAD00001001334 We propose to definitively characterise the somatic genetics of breast cancer through generation of comprehensive catalogues of somatic mutations in breast cancer cases by high coverage genome sequencing coupled with integrated transcriptomic and methylation analyses. Illumina HiSeq 2000 99
EGAD00001000185 UK10K_RARE_COLOBOMA REL-2012-02-22 Illumina Genome Analyzer II;, Illumina HiSeq 2000; 98 vcf
EGAD00010000474 blood-based gene expression from breast cancer cases and age-matched controls in case-control serie 2 (CC2) Illumina 98
EGAD00001000876 Illumina HiSeq 2000; 98 fastq
EGAD00001002154 ICGC PCAWG Dataset for WGS BAM aligned using BWA MEM. Project: PAEN-AU. 98 readme_file,bai,bam
EGAD00001002664 ICGC PCAWG Dataset for WGS BAM aligned using BWA MEM. Project: CMDI-UK. 98 readme_file,bai,bam
EGAD00001003216 Whole genome sequencing of tumour normal pairs of human undifferentiated sarcomas. HiSeq X Ten;ILLUMINA 98
EGAD00001000104 Acute Lymphoblastic Leukemia Exome sequencing 2 Illumina Genome Analyzer II 97 bam
EGAD00001000021 Paroxysmal neurological disorders Illumina HiSeq 2000, Illumina Genome Analyzer II, Illumina HiSeq 2000; 97 bam,srf
EGAD00001000613 UK10K_NEURO_ASD_MGAS REL-2013-04-20 Illumina HiSeq 2000; 97 vcf
EGAD00010000638 WTCCC2 Visceral Leishmaniasis samples from Indial using Illumina 670k 97
EGAD00010001051 APCDR AGV Project: Array data from 97 Barundi. Raw data, intensity files and post-QC Plink files. Illumina HumanOmni2-5_8v1_A 97
EGAD00001000312 UK10K_NEURO_ASD_MGAS REL-2012-11-27 Illumina HiSeq 2000; 96 vcf
EGAD00001001412 Whole genome sequencing of 48 tumor/normal pairs obtained from adult T-cell leukemia/lymphoma. This data set includes 11 full-pass WGS and 37 low-pass WGS data. Illumina HiSeq 2000;, HiSeq X Ten; 96 bam
EGAD00001002183 This data is part of a pre-publication release. For information on the proper use of pre-publication data shared by the Wellcome Trust Sanger Institute (including details of any publication moratoria), please see http://www.sanger.ac.uk/datasharing/ Illumina HiSeq 2000; 96 bam,cram
EGAD00001002230 Patient-derived xenografts (n=96) were derived from metastatic melanoma patients. RNA expression profiling will be preformed to study 1. HLA-typing and 2. the effect of the tumour microenvironment on tumour growth This data is part of a pre-publication release. For information on the proper use of pre-publication data shared by the Wellcome Trust Sanger Institute (including details of any publication moratoria), please see http://www.sanger.ac.uk/datasharing/ Illumina HiSeq 2000; 96 cram
EGAD00010001003 This data set contains two data files. First data file (file name: PREDO_GA_EGA_methylation_data.csv) includes methylation data from 485512 sites accross human genome from 96 individuals acquired from Illumina 450K -chip. The other data file (file name: PREDO_GA_EGA_phenotypes.csv) contains the gestation ages and the genders of the 96 samples. Illumina 450K-chip (methylation data) 96
EGAD00001003298 BAM outputs from RSEM (https://deweylab.github.io/RSEM/) analysis of RNASeq sequencing on HiSeq platform of tumour samples from 95 pancreatic adenocarcinoma cases. 96
EGAD00001002738 Background: In follicular lymphoma (FL), studies addressing the prognostic value of microenvironment-related immunohistochemical (IHC) markers and tumor cell-related genetic markers have yielded conflicting results, precluding implementation in practice. Therefore, the Lunenburg Lymphoma Biomarker Consortium (LLBC) performed a validation study for published markers. Methods: To maximize sensitivity, an end-of-spectrum design was applied for 122 uniformly immunochemotherapy-treated FL patients retrieved from international trials and registries; early failure (EF): progression or lymphoma-related death <2 years versus long remission: response duration of >5 years. IHC staining for T-cells and macrophages was performed on tissue microarrays from initial biopsy and scored with a validated computer-assisted protocol. Shallow whole-genome and deep targeted sequencing was performed on the same samples. Results: 96/122 cases with complete molecular and immunohistochemical data were included in the analysis. EZH2 wild-type (p=0.006), gain of chromosome 18 (p=0.002), low percentages of CD8+ cells (p=0.011) and CD163+ areas (p=0.038) were associated with EF. No significant differences in other markers were observed, thereby refuting previous claims on their prognostic significance. Conclusion: Using an optimized study design, this LLBC study validates wild-type EZH2 status, gain of chromosome 18, low percentages of CD8+ cells and CD163+ area as predictors of EF to immunochemotherapy in FL. Illumina HiSeq 2000;ILLUMINA 96
EGAD00001003335 A resource for assessment of exon CNV calling methods in targeted NGS data, we here present the ICR96 exon CNV validation series. The dataset includes high-quality sequencing data from a targeted NGS assay (the TruSight Cancer Panel) together with Multiplex Ligation-dependent Probe Amplification (MLPA) results for 96 independent samples. 66 samples contain at least one validated exon CNV and 30 samples have validated negative results for exon CNVs in 26 genes. The dataset includes 46 exon CNVs in BRCA1, BRCA2, TP53, MLH1, MSH2, MSH6, PMS2, EPCAM and PTEN, giving excellent representation of the cancer predisposition genes most frequently tested in clinical practice. Moreover, the validated exon CNVs include 25 single exon CNVs the most difficult exon CNV to detect. Illumina HiSeq 2500;ILLUMINA 96
EGAD00001000182 UK10K_NEURO_UKSCZ REL-2012-01-13 Illumina HiSeq 2000; 95 vcf
EGAD00001000347 These samples include exome sequences of family members with dyslipidemias from Finnish origin. Illumina HiSeq 2000; 95 bam
EGAD00001000717 Dataset of CageKid Tumor DNA samples 95 bam
EGAD00001000709 Dataset of CageKid Blood DNA samples 95 bam
EGAD00001000799 Atrio-ventricular septal defects (AVSD) are a specific form of congenital heart structural defect that result from abnormal or inadequate fusion of endocardial cushions during cardiac development. This project is focused on identifying rare coding variation that substantially increases risk of AVSD, by exome sequencing of AVSD patients and some of their family members, and comparing to control datasets from other sources. The exome sequencing is performed using Agilent SureSelect 50Mb exome v3 and Hiseq 75bp paired reads with an mean sequencing coverage target of 50X. Illumina HiSeq 2000; 95 bam
EGAD00001002149 Low coverage whole genome sequencing for the identification of somatic copy number alterations (SCNA) and focal amplification mapping in plasma DNA of prostate cancer patients Illumina MiSeq; 95 fastq
EGAD00001003270 ICGC DCC Release 24, PACA-CA Whole Genome sequence merged alignments 95
EGAD00001003433 RNA-Seq data for the paper titled "Orthotopic Patient-Derived Xenografts of Pediatric Solid Tumors" Illumina HiSeq 2000;ILLUMINA 95
EGAD00001001978 This dataset contains FASTQ files for multi-region exome-sequencing of EGFR-mutant lung adenocarcinomas from Asian patient. There are 16 patients and 95 samples in total, including 16 controls and 79 tumors. Multiple runs for each sample, and 368 fastq in total. Please refer to the sample-ID from filename for merging. Illumina HiSeq 2000 (ILLUMINA) 95
EGAD00001001979 This dataset contains BAM file for multi-region exome-sequencing of EGFR-mutant lung adenocarcinomas from Asian patient. There are 16 patients and 95 samples in total, including 16 controls and 79 tumors. Illumina HiSeq 2000; 95
EGAD00001001980 This dataset contains BAM files of targeted Amplicon deep-sequencing data, for validation of the mutations found in WES. There are 16 patients and 95 samples in total, including 16 controls and 79 tumors. Illumina HiSeq 2500; 95
EGAD00001001981 This dataset contains FASTQ files of targeted Amplicon deep-sequencing data, for validation of the mutations found in WES. There are 16 patients and 95 samples in total, including 16 controls and 79 tumors. 140 fastq in total, multiple runs for some of the samples. Please refer to the sample-ID from filename for merging. Illumina HiSeq 2500; 95
EGAD00001000988 Validation/deeper sequencing for metastatic prostate cancer samples Illumina HiSeq 2500; 94 cram
EGAD00001001122 FFPE normal panel generation for use with V3 cancer panel 0618521 Illumina HiSeq 2000; 94 cram
EGAD00001003781 Paired whole exome sequencing for 32 primary MDS, 14 MDS/MPN, and 8 AML-MRC cases (total = 54). Normal comparator genomic DNA was extracted from lymphocytes purified by flow cytometry. Bulk myeloid cells were used as a source of tumor gDNA. Files uploaded are mapped BAM files. Illumina HiSeq 2000;ILLUMINA 94
EGAD00001002722 Exome sequencing for 26 patients with matched blood RNA-seq for 41 patients Illumina HiSeq 2500;ILLUMINA 93 fastq
EGAD00001003415 ICGC PCAWG Dataset for RNA-Seq BAM aligned using Star. Project: OV-AU. 93
EGAD00001003416 ICGC PCAWG Dataset for RNA-Seq BAM aligned using TopHat2. Project: OV-AU. 93
EGAD00001001074 miRNA-seq Cohort of 92 Fresh Frozen Diffuse Large B-cell Lymphoma Patient Samples 92 bam
EGAD00001003242 This study comprises of three different datasets. 1) 57 samples from the 1243 canapps cell line study,2) 91 FFPE normal samples and 3) 87 samples from the SCORT WS2 dataset. The aim is to sequence these 235 samples in order to test the new V2 Colorectal bait design. Illumina HiSeq 2000;ILLUMINA 92
EGAD00001001125 Exome sequencing of Untreated BCC samples. Illumina HiSeq 2000; 91 fastq
EGAD00001000718 Dataset of CageKid Tumor RNA samples 91 bam
EGAD00001001085 This dataset includes 2 pairs of tumour/normal whole genome sequence data as well as MEN1 gene targeted sequencing of an additional 87 specimens. Illumina MiSeq;, Illumina HiSeq 2500; 91 bam
EGAD00010000881 Digital images of ovarian cancer sections Aperio 91
EGAD00001000190 UK10K_RARE_FIND REL-2012-02-22 Illumina Genome Analyzer II;, Illumina HiSeq 2000; 90 vcf
EGAD00001000344 Exome sequencing of 30 parent-offspring trios to >50X mean depth, where the offspring has sporadic TOF, to identify potential causal de novo mutations. We will use the exome plus design for pulldown that incorporates ~6.8Mb of additional regulatory sequences in addition to the ~50Mb GENCODE exome. Illumina HiSeq 2000; 90 bam
EGAD00001000697 Illumina HiSeq sequence data (with >30x coverage) were aligned to the hg19 human reference genome assembly using BWA (Li and Durbin, 2009); duplicate reads were removed from the final BAM file. No realignment or recalibration was performed. Illumina HiSeq 2000;, Illumina Genome Analyzer IIx; 90 bam
EGAD00001000720 Dataset of CageKid tumor-normal paired RNA samples 90 bam
EGAD00001000773 We aim to provide a powerful reference set for genome-wide association studies (GWAS) in African populations. Our pilot study to sequence 100 individuals each from Fula, Jola, Mandinka and Wollof from the Gambia to low coverage has been completed - this first part of the main effort will make available low coverage WGS data for 400 individuals from multiple ethnic groups in Burkina Faso, Cameroon, Ghana and Tanzania. This data is part of a pre-publication release. For information on the proper use of pre-publication data shared by the Wellcome Trust Sanger Institute (including details of any publication moratoria), please see http://www.sanger.ac.uk/datasharing/ Illumina HiSeq 2000; 90 cram
EGAD00001002122 ICGC PCAWG Dataset for WGS BAM aligned using BWA MEM. Project: BRCA-UK. 90 readme_file,bai,bam
EGAD00001003246 Whole exome sequencing of hepatosplenic T cell lymphoma (HSTL) tumors, paired normals, and cell lines, including (1) 68 exome capture, paired-end Illumina Hiseq sequencing, BAM files from HSTL tumor samples, (2) 20 exome capture, paired-end Illumina Hiseq sequencing, BAM files from HSTL paired normal samples, and (3) 2 exome capture, paired-end Illumina Hiseq sequencing, BAM files from HSTL cell lines. Illumina HiSeq 2500;ILLUMINA 90
EGAD00001000016 Familial Melanoma Sequencing Illumina HiSeq 2000, Illumina Genome Analyzer II, Illumina HiSeq 2000; 89 bam,srf
EGAD00001000151 UK10K OBESITY REL-2011-07-14 Illumina HiSeq 2000; 88 vcf
EGAD00001000207 UK10K_RARE_HYPERCHOL REL-2012-07-05 Illumina Genome Analyzer II;, Illumina HiSeq 2000; 88 vcf
EGAD00001000734 Paired end Illumina sequencing of whole exomes of multiple tumour regions. Illumina HiSeq 2000;, Illumina HiSeq 2500;, Illumina Genome Analyzer IIx; 88 bam
EGAD00001001213 Illumina HiSeq 2000; 88 bam
EGAD00001001935 Cancer amplicon reads consisting of BAM paired end reads from primary multiple myeloma samples. Illumina MiSeq; 88 bam
EGAD00010001057 APCDR AGV Project: Array data from 88 Mandinka. Raw data, intensity files and post-QC Plink files. Illumina HumanOmni2-5_8v1_A 88
EGAD00001003431 High-coverage WGS sequencing of DNA samples from 45pairs GCs was performed on the Illumina HiSeq X Ten System. Illumina HiSeq 2000 (ILLUMINA) 88
EGAD00001002650 Somatic variants called from whole-exome sequencing of meningioma-blood pairs 87 vcf
EGAD00001002088 Whole-genome sequencing on Illumina HiSeq2000/2500 of colorectal cancer primary tumor sample Illumina HiSeq 2000; 87
EGAD00001002077 RNAseq on Illumina HiSeq2000/2500 of colorectal cancer primary tumor sample Illumina HiSeq 2000; 87
EGAD00001000189 UK10K_RARE_NEUROMUSCULAR REL-2012-02-22 Illumina HiSeq 2000; 86 vcf
EGAD00001000422 We perform whole exome sequencing on samples from a large IBD pedigree. The selected samples are from more distantly related family members (healthy and with IBD) and a set of matched population (Ashkenazy Jewish ancestry) samples. Illumina HiSeq 2000; 86 bam
EGAD00010001414 Raw Array data from the PRAD-CA for ICGC DCC Release26 Affymetrix OncoScan FFPE Express 86
EGAD00001000759 Illumina HiSeq 2000; 86 bam,fastq
EGAD00001001048 Samples from Edwards et al 2015 - doi:10.1186/s12864-015-1685-z Illumina HiSeq 2000 (ILLUMINA) 86 bam
EGAD00001001442 This project is to explore the contribution of de novo mutations to severe structural malformations diagnosed prenatally using ultrasound. These malformations include heart, CNS, renal and GI abnormalities. In this pilot project we aim to exome sequence 30 parent-foetus trios to ~50X mean coverage and identify de novo functional variants using an algorithm developed in the Hurles group Illumina HiSeq 2000; 86 bam,cram
EGAD00001002066 KRAS mutant CRC is currently in clinical trial with a combination of a MEK and Akt inhibitor. These patients will likely develop resistance to this combination. We aim to identify the mechanisms of resistance via ENU mutagenesis, with a view to identifying additional therapeutics which have the ability to overcome this resistance. Illumina HiSeq 2500; 86 cram
EGAD00001002143 We expanded our previous collection of longitudinal GBM patients (EGAS00001001041) by recruiting 21 additional patients. Tumor specimens were subjected to whole-exome sequencing (16 of 21 cases, with the matched normal/blood) and transcriptome sequencing (16 of 21 cases). Illumina HiSeq 2500; 86 fastq
EGAD00010001012 BLUEPRINT DNA Methylation 450K data of mantle cell lymphoma Illumina HumanMethylation 450K 86
EGAD00010001046 APCDR AGV Project: Array data from 86 Sotho. Raw data, intensity files and post-QC Plink files. Illumina HumanOmni2-5_8v1_A 86
EGAD00001003135 DATA FILES FOR GRUBER SJAMLM7 RNASEQ Illumina HiSeq 2000;ILLUMINA 86
EGAD00001003133 RRBS data of 86 Ewing patients (French). Illumina HiSeq 2000/2500 (Fastq files available). Sheffield et al. Nat Med. 2017 Jan 30 Illumina HiSeq 2000;ILLUMINA 86
EGAD00001000229 EGAD00001000229_UK10K_NEURO_ASD_FI_REL_2012_07_05 Illumina HiSeq 2000; 85 vcf
EGAD00001000173 UK10K_NEURO_ASD_FI REL-2012-01-13 Illumina HiSeq 2000; 85 vcf
EGAD00010000831 BLUEPRINT EpiMatch: harnessing epigenetics for hematopoietic stem cell transplantation Illumina Infinium HumanMethylation450 BeadChips 85
EGAD00001000311 UK10K_NEURO_ASD_FI REL-2012-11-27 Illumina HiSeq 2000; 84 vcf
EGAD00001000015 Exome sequencing of hyperplastic polyposis patients. Illumina HiSeq 2000, Illumina Genome Analyzer II, Illumina HiSeq 2000; 84 bam,srf
EGAD00001000435 UK10K_NEURO_ASD_FI REL-2013-04-20 Illumina HiSeq 2000; 84 vcf
EGAD00001000951 Whole exome sequencing data for ependymomas (42 tumor-control pairs). See Mack, Witt et al. Nature 506(7489):445-50, 2014 (PMID: 24553142). 84 bam
EGAD00001001222 TGCT Whole Exome Sequencing data Illumina HiSeq 2500; 84 bam
EGAD00001001921 All pituitary samples Illumina HiSeq 2500; 84 bam
EGAD00001002234 This study involves mutagenizing C32, a melanoma cell line, with ENU to identify those mutations which engender resistance to a targeted treatment. Illumina HiSeq 2000; 84 cram
EGAD00001003132 ICGC PCAWG Dataset for WGS BAM aligned using BWA MEM. Project: GACA-CN. 84
EGAD00001000386 Wholegenome libraries will be prepared from at least two serial samples reflecting different stages of disease progression and matched constitutional DNA for 30 Myelodysplastic syndrome patient samples. Five lanes of Illumina HiSeq sequencing will be performed on each of the tumour samples and four lanes for each of the constitutional DNA. Sequencing data will mapped to build 37 of the human reference genome and analysis will be performed to characterize the spectrum of somatic variation present in these samples including single base pair mutations, insertions, deletions as well as larger structural variants and genomic rearrangements. Illumina HiSeq 2000; 83 bam,cram
EGAD00001002001 Mapped data (bam files) for high-throughput whole genome sequence data for 83 modern Aboriginal Australians 83 bam,bai
EGAD00001001282 McGill EMC Release 4 in tissue "venous blood" for cell type "Monocyte" Illumina HiSeq 2500; 82 fastq
EGAD00001001887 Exome sequencing VCF files describing mutations during glioma progression. 82 vcf
EGAD00001003263 ICGC DCC Release 24, PACA-CA Deep KRAS sequencing 82
EGAD00001000218 UK10K_RARE_SIR REL-2012-07-05 Illumina Genome Analyzer II;, Illumina HiSeq 2000; 81 vcf
EGAD00001000354 Testing the feasibility of genome-scale sequencing in routinely collected formalin-fixed paraffin-embedded (FFPE) cancer specimens versus matched fresh-frozen samples using targeted pulldown capture prior to Illumina sequencing. Illumina HiSeq 2000; 81 bam
EGAD00001003410 ICGC PCAWG Dataset for RNA-Seq BAM aligned using Star. Project: PACA-AU. 81
EGAD00001003411 ICGC PCAWG Dataset for RNA-Seq BAM aligned using TopHat2. Project: PACA-AU. 81
EGAD00001003811 Our project will examine the role of PIK3CA mutations and their sensitivity to endocrine therapies and its role, with the addition of complete ovarian suppression. We plan to test our hypotheses using tumour samples collected from patients enrolled in the SOFT/IBCSG24-02 clinical study (Suppression of Ovarian Function Trial - (NCT00066690). SOFT is a phase III trial that randomised 3066 premenopausal women to evaluate if adding ovarian suppression to adjuvant endocrine therapy will improve clinical outcomes. This dataset contains all the data available for this study on 2017-11-22. Illumina HiSeq 2500;ILLUMINA 81
EGAD00001000135 Neuroblastoma whole genome sequencing Illumina HiSeq 2000 80 bam
EGAD00001000132 Mutational landscapes of primary triple negative breast cancers - RNA seq Illumina Genome Analyzer IIx, Illumina Genome Analyzer IIx; 80 bam
EGAD00001000771 We aim to provide a powerful reference set for genome-wide association studies (GWAS) in African populations. Our pilot study to sequence 100 individuals each from Fula, Jola, Mandinka and Wollof from the Gambia to low coverage has been completed - this first part of the main effort will make available low coverage WGS data for 400 individuals from multiple ethnic groups in Burkina Faso, Cameroon, Ghana and Tanzania. This data is part of a pre-publication release. For information on the proper use of pre-publication data shared by the Wellcome Trust Sanger Institute (including details of any publication moratoria), please see http://www.sanger.ac.uk/datasharing/ Illumina HiSeq 2000; 80 bam,cram
EGAD00001000976 WGS DATA FILES FOR SJPhLike Illumina HiSeq 2000; 80 bam
EGAD00001000775 Whole exome sequencing of 41 melanomas and normal DNA from Braf mutant mice: 15 tumours from UV exposed mice, 15 tumours from non-exposed mice and 11 from UV exposed, sunscreen-protected mice. Illumina HiSeq 2000; 80 bam
EGAD00010000600 Prostate Adenocarcinomas samples using 450K Illumina450K 80
EGAD00001002235 Many studies over the past 10 years, culminating in the recent report of the International Stem Cell Initiative (ISCI, 2011) have shown that hPSC acquire genetic and epigenetic changes during their time in culture. Many of the genetic changes are non-random and recurrent, probably because they provide a selective growth advantage to the undifferentiated cells. Some are shared by embryonal carcinoma cells, the malignant counterparts of ES cells. The origins of these growth advantages are poorly understood, but may come from altered cell cycle dynamics, resistance to apoptosis or altered patterns of differentiation. Less is known about the nature and consequences of epigenetic changes, but it is likely that these similarly affect hPSC behaviour; e.g., enhanced expression of DLK1, an imprinted gene, is associated with altered hPSC growth (Enver et al 2005). Inevitably, these genetic and epigenetic changes will impact on our ability to use hPSC for regenerative medicine, either because malignant transformation of the undifferentiated cells or their differentiated derivatives to be used for transplantation compromises safety, or because they impede the function of those differentiated derivatives, or because they affect the efficiency with which the undifferentiated cells can be expanded and differentiated into desired cell types. Focusing initially upon the existing clinical grade hESC lines, later moving to iPSC, we will Consolidate and extend knowledge of the rate, type and functional impact of the genetic variations that occur during hPSC culture. We will use whole genome and exome sequencing as well as SNP arrays, together with clonal analysis and other cytogenetics techniques. Common changes will be compared with those found in the normal human population, at low frequency in the original cell population or observed during iPSC generation in the HIPSCI project currently based at the WTSI. These studies will provide a better understanding of the range of genetic changes that occur in hPSC beyond the CNVs already identified. In conjunction with cancer genome resources and expertise at WTSI, bioinformatic analyses of these hPSC data will allow us to assess potential impact on hPSC behaviour pertinent to applications in regenerative medicine, notably the likelihood that specific changes arising in undifferentiated PSC cultures may be associated with potential malignant transformation of differentiated progeny. This data is part of a pre-publication release. For information on the proper use of pre-publication data shared by the Wellcome Trust Sanger Institute (including details of any publication moratoria), please see http://www.sanger.ac.uk/datasharing/ Illumina HiSeq 2000; 80 cram
EGAD00001003239 This study involves mutagenizing C32, a melanoma cell line, with ENU to identify those mutations which engender resistance to a targeted treatment. Illumina HiSeq 2000;ILLUMINA 80
EGAD00001003218 There are 80 Brain cancer cases (160 samples)in this study and belong to GBM-CN project. Illumina HiSeq 2000;ILLUMINA 80
EGAD00001003456 There are 5WGS and 35WES sample pairs from the first affiliated hospital of kunming medical university, which belongs to ICGC projects COCA-CN. Illumina HiSeq 2000;ILLUMINA 80
EGAD00001000223 RNA sequencing of SCLC tumor/normal sample pairs and cell lines Illumina HiSeq 2000; 79 fastq
EGAD00010001048 APCDR AGV Project: Array data from 79 Jola. Raw data, intensity files and post-QC Plink files. Illumina HumanOmni2-5_8v1_A 79
EGAD00001000410 We will perform exome sequencing on selected cases of splenic marginal zone lymphoma (SMZL) and diffuse large B-cell lymphoma (DLBCL) in order to characterise their genetic makeup and identify biomarkers for prognosis and prediction of treatment response. Illumina HiSeq 2000; 78 bam,cram
EGAD00001000699 Illumina HiSeq sequence data (with >80x coverage) were aligned to the hg19 human reference genome assembly using BWA (Li and Durbin, 2009); duplicate reads were removed from the final BAM file. No realignment or recalibration was performed. Illumina HiSeq 2000; 78 bam
EGAD00001002009 Exome sequencing of high-risk prostate cancer Illumina HiSeq 2000; 78 bam
EGAD00010001050 APCDR AGV Project: Array data from 78 Wolof. Raw data, intensity files and post-QC Plink files. Illumina HumanOmni2-5_8v1_A 78
EGAD00001003331 Whole-exome sequencing of a cohort of families (probands and affected/unaffected relatives) suffering from one of two rare thyroid disorders: congenital hypothyroidism (CH) and resistance to thyroid hormone (RTH). This dataset contains all the data available for this study on 2017-05-11. Illumina HiSeq 2000;ILLUMINA, Illumina HiSeq 2500;ILLUMINA 78
EGAD00010000260 PNET genotyping Illumina OmniQuad 2.5 - CNVpartition 77
EGAD00001000434 UK10K_NEURO_ASD_BIONED REL-2013-04-20 Illumina HiSeq 2000; 77 vcf
EGAD00001000436 UK10K_NEURO_ASD_GALLAGHER REL-2013-04-20 Illumina HiSeq 2000; 77 vcf
EGAD00001000772 We aim to provide a powerful reference set for genome-wide association studies (GWAS) in African populations. Our pilot study to sequence 100 individuals each from Fula, Jola, Mandinka and Wollof from the Gambia to low coverage has been completed - this first part of the main effort will make available low coverage WGS data for 400 individuals from multiple ethnic groups in Burkina Faso, Cameroon, Ghana and Tanzania. This data is part of a pre-publication release. For information on the proper use of pre-publication data shared by the Wellcome Trust Sanger Institute (including details of any publication moratoria), please see http://www.sanger.ac.uk/datasharing/ Illumina HiSeq 2000; 77 cram
EGAD00001000854 DATA FILES FOR SJEPD Illumina HiSeq 2000; 77 bam
EGAD00001003096 As part of the International Parkinson's Disease Genomics Consortium, exomes of Parkinson's disease (PD) patients and healthy controls were sequenced to study the genetic etiology of PD. This UK cohort consists of 70 PD patients. Researchers can apply for access to fastq files for this cohort. Illumina HiSeq 2000;ILLUMINA 77 fastq
EGAD00001000310 UK10K_NEURO_ASD_BIONED REL-2012-11-27 Illumina HiSeq 2000; 76 vcf,bam
EGAD00001000737 Whole exome sequencing data from 30 donors (46 tumors and 30 non-tumoral whole exome sequencing, paired-end, HiSeq 2000, Illumina) collected by the Inserm U674, PI Jessica Zucman-Rossi - Institut National du Cancer (INCa), PI Fabien Calvo, France. Illumina HiSeq 2000; 76 bam
EGAD00001001015 Illumina HiSeq 2000; 76 bam
EGAD00001001864 DATA FILES FOR PCGP MB WGS - Supersedes (EGAD00001000269) Illumina HiSeq 2000; 76 bam
EGAD00001001459 Transcriptome sequencing of tumour tissue, adjacent normal tissue and derived organoids/tumoroids from colorectal cancer. This dataset contains all the data available for this study on 2015-08-05. Illumina HiSeq 2000; 76 cram
EGAD00001001339 We propose to definitively characterise the somatic genetics of breast cancer through generation of comprehensive catalogues of somatic mutations in breast cancer cases by high coverage genome sequencing coupled with integrated transcriptomic and methylation analyses. Illumina HiSeq 2000; 76 bam,cram
EGAD00001003347 This data is part of a pre-publication release. For information on the proper use of pre-publication data shared by the Wellcome Trust Sanger Institute (including details of any publication moratoria), please see http://www.sanger.ac.uk/datasharing/ This dataset contains all the data available for this study on 2017-05-24. Illumina HiSeq 2000;ILLUMINA 76
EGAD00001003119 TP53 targeted panel aligned reads consisting of BAM paired end reads from ovarian cancer tumor samples Data Access Committee Illumina MiSeq;ILLUMINA 76
EGAD00001002104 Whole-exome sequencing on AB 5500xl Genetic Analyzer of Blood EDTA AB 5500xl Genetic Analyzer; 76
EGAD00001000179 UK10K_RARE_COLOBOMA REL-2012-01-13 Illumina Genome Analyzer II;, Illumina HiSeq 2000; 75 vcf
EGAD00001000316 UK10K_NEURO_ASD_GALLAGHER REL-2012-11-27 Illumina HiSeq 2000; 75 vcf
EGAD00010000462 SJLGG Case samples using Gene Expression Array Affymetrix_U133v2 75
EGAD00001000793 RNA sequencing (RNA-Seq) for St. Jude High Grade Glioma (HGG) study Illumina HiSeq 2000; 75 bam
EGAD00001001269 Exome bam files of 75 Individuals From Multiply Affected Coeliac Families Illumina Genome Analyzer II;, Illumina Genome Analyzer IIx; 75 bam
EGAD00001001451 JMML targeted sequencing of candidate genes Illumina MiSeq; 75 bam
EGAD00001003158 Bam files consisting of aligned MeDIP-seq reads from cord blood cells and cord blood mononuclear cells of twins conceived through in vitro fertilisation Illumina Genome Analyzer II;ILLUMINA 75
EGAD00010000272 Colon tumour samples Illumina_2.5M 75
EGAD00010000546 SNP 6.0 arrays of carcinoid samples Affymetrics_SNP_6.0- 74
EGAD00001002153 ICGC PCAWG Dataset for WGS BAM aligned using BWA MEM. Project: PAEN-IT. 74 bam,bai,readme_file
EGAD00010001054 APCDR AGV Project: Array data from 74 Fula Illumina HumanOmni2-5_8v1_A 74
EGAD00001003212 We aim to provide a powerful reference set for genome-wide association studies (GWAS) in African populations. Our pilot study to sequence 100 individuals each from Fula, Jola, Mandinka and Wollof from the Gambia to low coverage has been completed - this first part of the main effort will make available low coverage WGS data for 400 individuals from multiple ethnic groups in Burkina Faso, Cameroon, Ghana and Tanzania. This data is part of a pre-publication release. For information on the proper use of pre-publication data shared by the Wellcome Trust Sanger Institute (including details of any publication moratoria), please see http://www.sanger.ac.uk/datasharing/ HiSeq X Ten;ILLUMINA 74
EGAD00001003275 Targeted resequencing of samples was done with TruSeq custom amplicon low input kit (TSCA-LI, Illumina). The oligo capture probes were designed to include a prefix of 8 random nucleotides at the 5 end of each probe. The assay is designed such that each targeted locus is annealed with two probes, resulting in amplicons tagged with unique molecular identifiers (UMI) (22) of 16 bases. Raw FASTQ sequencing files were processed as following: (a) The first 8 bases were trimmed from each read and recorded with the corresponding base quality scores (BQ) in the attribute field. (b) Reads were aligned with BWA. (c) First round of PCR duplicate cleaning was performed with picard tools markDuplicates using the parameters BARCODE_TAG=BC TAGGING_POLICY=All REMOVE_DUPLICATES=true (d) Since in the previous step only duplicate reads with identical UMIs were removed, a second pass of filtering was done. Reads with identical mapping were considered unique only if their corresponding UMIs were different in at least 3 positions (i.e., UMI edit distance > 2). (e) Paired-end read pairs overlapping genomic positions were clipped to avoid overestimation of the sequencing coverage using bamUtils clipOverlap. NextSeq 550;ILLUMINA 74
EGAD00001003126 WGS data of medulloblastoma tumor/control pairs. 74
EGAD00010000274 Colon matched tumour samples Illumina_2.5M 74
EGAD00001003548 ICGC PCAWG Dataset for RNA-Seq BAM aligned using TopHat2. Project: CLLE-ES. 74
EGAD00001003549 ICGC PCAWG Dataset for RNA-Seq BAM aligned using Star. Project: CLLE-ES. 74
EGAD00001000617 Pilocytic Astrocytoma ICGC PedBrain RNA sequencing Illumina HiSeq 2000; 73 fastq
EGAD00001001430 Investigation into causal genes underlying anaplastic meningioma Illumina HiSeq 2000; 73 cram
EGAD00001000230 EGAD00001000230_UK10K_NEURO_ASD_GALLAGHER_REL_2012_07_05 Illumina HiSeq 2000; 72 vcf
EGAD00001000667 Illumina HiSeq 2000; 72 bam
EGAD00001000712 Illumina HiSeq 2000; 72 bam
EGAD00001000293 Sequencing data for Australian Ovarian Cancer study submitted 20121116 AB SOLiD 4 System; 72 bam
EGAD00001001661 Genotype and exome data for an Australian Aboriginal population: a reference panel for health-based research. 72 vcf
EGAD00001001397 We sequenced 292 patients who were suffering NSCLC with Whole genome sequencing or Exome sequencing method. Illumina Genome Analyzer II;, Illumina HiSeq 2000; 72 fastq
EGAD00001003355 From 17 patients undergoing knee joint replacement surgery for osteoarthritis, we collected 4 samples each: intact cartilage, degraded cartilage, synovium, and meniscus. We also collected blood for DNA analysis. Multiplexed libraries were sequenced on Illumina HiSeq 2000 (75bp paired-end read length) and a cram file was produced for each sample. This dataset contains all the data available for this study on 2017-06-09. Illumina HiSeq 2500;ILLUMINA 72
EGAD00001000186 UK10K_RARE_HYPERCHOL REL-2012-02-22 Illumina Genome Analyzer II;, Illumina HiSeq 2000; 71 vcf
EGAD00001001249 WES of HCC by HiSeq 2000,total 71 samples including Hepatocellular carcinoma cell lines and nornal sample(Peripheral Blood or the adjacent tissues of cancer) Illumina HiSeq 2000; 71 fastq
EGAD00001002207 Our aim is to identify genes involved in resistance to anti-cancer therapies. In order to do this we have taken advantage of a lentiviral vector (LV)-based insertional mutagen to mutagenize cancer cell lines. LV-transduced cell lines were then treated with anti-cancer therapies and the emergence of resistant clones scored. DNA from pools of resistant clones was collected, subjected to custom capture by baits designed against the LV sequence, and then sequenced to identify the LV-genomic junction. We hope that the identification of recurrently targeted genes in resistant cell population will allow us to identify genes that mediate drug resistance. Illumina MiSeq;, Illumina HiSeq 2500; 71 cram
EGAD00001002693 Innate immune memory is the phenomenon whereby innate immune cells such as monocytes or macrophages undergo functional reprogramming after exposure to microbial components such as LPS. We apply an integrated epigenomic approach to characterize the molecular events involved in LPS-induced tolerance in a time dependent manner. ChIP-seq, RNA-seq, WGBS and ATAC-seq data were generated. This analysis identified epigenetic programs in tolerance and trained macrophages, and the potential transcription factors involved. Experimental set-up Time-course in vitro culture of human monocytes. Two innate immune memory states can be induced in culture through an initial exposure of primary human monocytes to either LPS or BG for 24 hours, followed by removal of stimulus and differentiation to macrophages for an additional 5 days. Cells were collected at baseline (day 0), 1 hour, 4 hour, 24 hour and 6 days. unspecified;, Illumina HiSeq 2000;ILLUMINA, Illumina HiSeq 2000;, NextSeq 500; 71 fastq,bam
EGAD00001002717 In this study we characterized genomic alterations in two to five metachronous bladder tumors from 29 patients initially diagnosed with early stage disease. Fourteen patients (32 tumors) had non progressive disease (NPD) and 15 patients (34 tumors) had progressive disease (PD). Whole exome sequencing (WES, ~50x mean read depth and whole transcriptome RNA-seq was performed (RNA was not advalible for 4 tumors). Data provided here consist of 71 unmapped Bam files form whole transcriptome RNA-seq. Illumina HiSeq 2000;ILLUMINA 71 bam
EGAD00001002718 In this study we characterized genomic alterations in two to five metachronous bladder tumors from 29 patients initially diagnosed with early stage disease. Fourteen patients (32 tumors) had non progressive disease (NPD) and 15 patients (34 tumors) had progressive disease (PD). Whole exome sequencing (WES, ~50x mean read depth and whole transcriptome RNA-seq was performed (RNA was not advalible for 4 tumors). Data provided here consist of 71 mapped Bam files form whole transcriptome RNA-seq. Illumina HiSeq 2000;ILLUMINA 71 bam
EGAD00001000327 release_2: ICGC PedBrain: whole genome mate-pair sequencing Illumina HiSeq 2000;, Illumina Genome Analyzer IIx; 70 bam,fastq
EGAD00001001330 In this experiment we have sequenced tumour normal pairs from patients presenting with CRC who have a prior history of inflammatory bowel disease. The idea is to identify driver mutations, new genes and novel pathways associated with the development of these malignancies. Illumina HiSeq 2000; 70 cram
EGAD00001001399 Data represent genome-wide DNA methylation profiles obtained by MethylCap-seq (Diagenode’s MethylCap-kit based purification followed by Illumina GAIIx sequencing), for 70 brain tissue samples, including 65 glioblastoma samples and 5 non-tumoral tissues (obtained from epilepsy surgery). Illumina Genome Analyzer IIx; 70 fastq
EGAD00010000829 Illumina Infinium 450K array data 70
EGAD00001002111 70 Whole exome sequencing from 9 patients with DIPG for project Spatial and Temporal Homogeneity of Driver Mutations in Diffuse Intrinsic Pointine Glioma Illumina HiSeq 2500; 70 bam
EGAD00001003265 For CCOC cohorts, OvCaRe cases were reviewed, including frozen material, by at least two expert gynecopathologists prior to inclusion in the sequencing cohort who provided the confirmation on final selected cohort. Frozen H&E from Tokyo were also used for evaluation along with representative H&E photos and review done at the Jikei School of Medicine. All CCOC tumours are primary tumour samples. Library construction and sequencing Frozen specimens with >50% tumour cellularity (based on initial slide review) were used for cryosectioning and subsequent nucleic acid extraction. Patient tumour and normal blood samples derived from primary, untreated fresh frozen tumour specimens harvested at diagnosis during standard of care debulking surgery. Germline DNA was provided from peripheral blood buffy coat on all specimens except 13 from Tokyo, where non-cancer frozen tissue was used as a germline source. DNA extraction from both matched normal (blood) and tumour samples (frozen tissue) were performed using the QIAamp Blood and Tissue DNA kit (Qiagen) and quantified using a Qbit fluorometer and reagents (high-sensitivity assay). Three lanes of Illumina HiSeq 2500 v4 chemistry for normal samples and five lanes for tumour samples were obtained. The PCR-free protocol was adopted to eliminate the PCR-induced bias and improve coverage across the genome. Illumina HiSeq 2000;ILLUMINA, Illumina Genome Analyzer II;ILLUMINA 70
EGAD00001000795 Fernandez-Cuesta et al, 2014, Nature Communication, RNA Sequencing data set Illumina HiSeq 2000; 69 fastq
EGAD00001000708 AZIN1 amplicon sequencing data of the EGAS00001000495 project. 454 GS FLX Titanium; 69 fastq
EGAD00001001381 Illumina HiSeq 2000; 69 fastq
EGAD00001002082 Whole-genome sequencing on Illumina HiSeq2000/2500 of Blood EDTA Illumina HiSeq 2000; 69
EGAD00001000269 OLD DATA FILES FOR SJMB - Superseded by EGAD00001001864 Illumina HiSeq 2000; 68 bam
EGAD00001000429 UK10K_OBESITY_TWINSUK REL-2013-04-20 Illumina HiSeq 2000; 68 vcf
EGAD00001000694 This is an ongoing project and continuation to all the sequencing we have been doing over the last few years. We have some additional families and probands with syndromes of insulin resistance not previously sequenced within uk10k or other core funded projects. We would like to complete the sequencing in all of the good quality families and probands we have, this would require another ~50 samples to be WES sequenced. This cohort has already proven to be a rich source of interesting findings with papers in Science and Nature genetics. Illumina HiSeq 2000; 68 bam,cram
EGAD00001000411 These samples include exome sequences of family members with dyslipidemias from northern Finnish origin. Illumina HiSeq 2000; 68 bam
EGAD00001000724 Illumina HiSeq 2000; 68 bam
EGAD00001000627 Transcriptome sequencing data of tumor and 10 matched normal samples of the EGAS00001000495 project Illumina HiSeq 2000; 68 fastq
EGAD00001002117 Raw data (fastq files) from targeted resequencing of AML patients at diagnosis Illumina MiSeq; 68 fastq
EGAD00010001040 Methylation changes in OA patients with chronic exposure to cobalt and chromium Illumina HumanMethylation450 68
EGAD00010001287 Array methylation profiling of knee osteoarthritis patients who have undergone total joint replacement Illumina HumanMethylation450K 68
EGAD00001003186 Variants on the Y chromosome for 62 danish males in VCF format from the GenomeDenmark Phase 2 cohort. Variants were called using reference based approaches such as the haplotype-caller module from GATK and using alignment of denovo assemblies to the reference using ASMvar. 68
EGAD00001000106 Primary Myelofibrosis Myeloproliferative Disease exome sequencing Illumina Genome Analyzer II, Illumina HiSeq 2000; 67 bam
EGAD00010001162 Oncotrack primary tumor samples using 450K. The dataset includes shared AF analysis files oncotrackDNAmAnalysis.R and oncotrackDNAmBetaScores.txt which were applied for both Oncotrack_450K_tumor (EGAD00010001162) and Oncotrack_450K_metastatic (EGAD00010001161) datasets. Illumina 450K 67
EGAD00001000342 This project aims to find causal variants in 50 patients diagnosed with Microcephalic Osteodysplastic Primordial Dwarfism (MOPD), of presumed recessive inheritance performing whole exome sequencing to ~50x mean depth. This is a collaboration with Prof A. Jackson, MRC Human Genetics Unit, Edinburgh Illumina Genome Analyzer II;, Illumina HiSeq 2000; 66 bam
EGAD00001000628 Illumina HiSeq 2000;, Illumina Genome Analyzer IIx; 66 bam
EGAD00001001035 RIKEN collection WGS and RNA-seq reads for 66 HBV-associated HCC and matched blood or liver samples from 22 donors. Illumina HiSeq 2000;, Illumina Genome Analyzer IIx; 66 fastq
EGAD00001001937 Targeted sequencing of 48 amplicons in TP53, PTEN, EGFR, PIK3CA, KRAS and BRAF genes was performed as described previously [Forshew, STM 2012]. All libraries were pooled and quantify using DNA 1000 kit on Agilent 2100 Bioanalyzer and KAPA SYBR FAST ABI Prism qPCR Kit (KAPA Biosystems) on 7900HT Fast Real-Time PCR System (Applied Biosystems) according to the supplier's recommendations. Reads were aligned using bwa-mem v0.7.12-r1039 to the 1000 genomes version of human genome build GRCh37, retaining duplicate reads. Illumina MiSeq; 66 bam
EGAD00001002107 Whole-exome sequencing on AB 5500xl Genetic Analyzer of colorectal cancer primary tumor sample AB 5500xl Genetic Analyzer; 66
EGAD00010001079 Affymetrix SNP6.0 array breast cancer data Affymetrix SNP6.0 66
EGAD00001003244 We aim to sequence the mRNA transcriptome of 22 human melanoma cell lines in biological triplicate in order to define the gene expression profile of each cell line. The data will be correlated to the mutation status and the sensitivity to a panel of drugs in order to identify genes whose deregulation is associated to drug resistance This data is part of a pre-publication release. For information on the proper use of pre-publication data shared by the Wellcome Trust Sanger Institute (including details of any publication moratoria), please see http://www.sanger.ac.uk/datasharing/ Illumina HiSeq 2000;ILLUMINA 66
EGAD00001003245 We aim to sequence the small RNAs of 22 human melanoma cell lines in biological triplicate in order to define the microRNAs expression profile of each cell line. The data will be correlated to the mutation status and the sensitivity to a panel of drugs in order to identify genes whose deregulation is associated to drug resistance This data is part of a pre-publication release. For information on the proper use of pre-publication data shared by the Wellcome Trust Sanger Institute (including details of any publication moratoria), please see http://www.sanger.ac.uk/datasharing/ Illumina HiSeq 2500;ILLUMINA 66
EGAD00001003310 There are 66 pairs of LAML cases(complete genomics) in this project which belongs to LAML-CN..The library is constructed by the Completes Genomics protocol. Complete Genomics;COMPLETE_GENOMICS 66
EGAD00001000208 UK10K_RARE_THYROID REL-2012-07-05 Illumina Genome Analyzer II;, Illumina HiSeq 2000; 65 vcf
EGAD00001000187 UK10K_RARE_THYROID REL-2012-02-22 Illumina Genome Analyzer II;, Illumina HiSeq 2000; 65 vcf
EGAD00001003100 UKBEC 1st release of Exome data for 65 neuropathologically confirmed control individuals of European descent. Illumina HiSeq 2000;ILLUMINA 65
EGAD00001000380 Illumina paired-end sequencing of whole- exome pulldown DNA from Severe Insulin Resistant patients. Illumina Genome Analyzer II;, Illumina HiSeq 2000; 64 bam
EGAD00010000238 CLL Expression array Affymetrix GeneChip Human Genome U133 plus 2.0 64
EGAD00001000901 The dataset includes the whole exome sequencing data from 32 pairs of gallbladder caner tissues and patient-matched normal tissues. Illumina HiSeq 2500; 64 bam
EGAD00001000894 SPECTA comprises a network of participating European clinical sites and NGS screening platforms that can screen individual patients for multiple molecular targets and potentially allow the design of trials that will match the specific biology of the diseases affecting specific patients with cancer. Illumina HiSeq 2500; 64 cram
EGAD00001001120 Whole Genome Sequencing Illumina HiSeq 2000;, Illumina HiSeq 2500; 64 fastq
EGAD00001002011 RNA sequencing data of whole blood samples from smoking and non-smoking mothers and their children at gestation/birth and follow-up years. 64 bam
EGAD00001002202 Here we have from 64 samples, their corresponding fastq and bam files. The study group consisted of 17 obese women with normal glucose tolerance and 15 obese women with T2DM classified according to WHO standards. The groups were matched for age, BMI and waist circumference. All the women had been morbidly obese (BMI>40 kg/m2) for at least five years. Illumina HiSeq 2000; 64 bam,fastq
EGAD00001002665 Mapped sequence reads in BAM format for 64 individuals reporting Kanak ancestry recruited in New Caledonia sequenced at four times target coverage using the Illumina HiSeq 4000 platform. 64 bam,bai
EGAD00001000188 UK10K_RARE_SIR REL-2012-02-22 Illumina Genome Analyzer II;, Illumina HiSeq 2000; 63 vcf
EGAD00010000526 SNP 6.0 arrays of small cell lung cancer Affymetrics_SNP_6.0- 63
EGAD00001000806 Whole Genome Sequencing (WGS) for St. Jude High Grade Glioma (HGG) study Illumina HiSeq 2000 (ILLUMINA) 63 bam
EGAD00001001098 DATA FILES FOR SJINF RNASeq Illumina HiSeq 2000; 63 bam
EGAD00001001411 RNA sequencing of 57 tumor samples of adult T-cell leukemia/lymphoma as well as 3 samples of HTLV-1 carrier and 3 samples of healthy volunteers. Illumina HiSeq 2000; 63 bam
EGAD00001002732 DNA methylation was analyzed for stem/progenitor cell types and terminally differentiated cell types of the human blood lineage (HSC, MPP, CMP, MEP, GMP, CLP, MLP0, MLP1, MLP2, MLP3, MK, CD4+ Tcell, CD8+ Tcell, Bcell, NK, Neut, Mono). Illumina HiSeq 4000;ILLUMINA 63 bam
EGAD00001003822 The dataset comprises 8 breast cancer, 11 ovarian cancer, 1 benign tumour, 18 normal tissue, 2 endometrium, and 23 white blood cell samples. Genome wide methylation analysis was performed by Reduced Representation Bisulfite Sequencing (RRBS) on Illumina HiSeq 2500. Data is provided as FASTQ files Illumina HiSeq 2500;ILLUMINA 63
EGAD00010000417 Han Chinese samples using Illumina OMNIExpress (cases) Illumina OMNIExpress 62
EGAD00010000419 Han Chinese samples using Affymetrix (cases) Affymetrix_6.0 62
EGAD00001000869 It is the ambition of the team formed by members of the Netherlands Cancer Institute (NKI) and the Cancer Genome Project at the Wellcome Trust Sanger Institute (WTSI) to unravel the genomic and phenotypic complexity of human cancers in order to identify optimal drug combinations for personalized cancer therapy. Our integrated approach will entail (i) deep sequencing of human tumours and cognate mouse tumours; (ii) drug screens in a 1000+ fully characterized tumour cell line panel; (iii) high-throughput in vitro and in vivo shRNA and cDNA drug resistance and enhancement screens; (iv) computational analysis of the acquired data, leading to significant response predictions; (v) rigorous validation of these predictions in genetically engineered mouse models and patient-derived xenografts. This integrated effort is expected to yield a number of combination therapies and companion-diagnostics biomarkers that will be further explored in our existing clinical trial networks. Illumina HiSeq 2000; 62 cram
EGAD00001000891 We propose to definitively characterise the somatic genetics of Prostate cancer through generation of comprehensive catalogues of somatic mutations by high coverage genome sequencing. See ICGC website for more information: http://icgc.org/icgc/cgp/70/508/71331 Illumina HiSeq 2000; 62 bam
EGAD00001001460 Whole-exome sequencing of a cohort of families (probands and affected/unaffected relatives) suffering from one of two rare thyroid disorders: congenital hypothyroidism (CH) and resistance to thyroid hormone (RTH). This dataset contains all the data available for this study on 2015-08-05. Illumina HiSeq 2000; 62 cram
EGAD00010000869 RNA expression microarray Illumina_HumanHT-12v4 62
EGAD00010001001 Primary renal cell carcinoma (RCC), RCC metastases and cell lines by Illumina 450K Illumina 450K 62
EGAD00001002662 ICGC PCAWG Dataset for WGS BAM aligned using BWA MEM. Project: LINC-JP. 62 readme_file,bai,bam
EGAD00001000083 Recurrent Somatic Mutations in CLL Illumina Genome Analyzer II;, Illumina Genome Analyzer IIx 61 fastq
EGAD00001000116 Acute Lymphoblastic Leukemia Sequencing Illumina HiSeq 2000, Illumina Genome Analyzer II, Illumina HiSeq 2000; 61 bam,srf
EGAD00001000809 RIKEN collection WGS reads for 61 liver cancer and matched blood samples from 30 donors displaying biliary phenotype Illumina HiSeq 2000; 61 fastq
EGAD00001001928 This study will analyse the guide sequence which were used for making mutations in the Cas9-expressing cells. We used GeCKO v2 library which were released by Feng Zhang, 2014. This data is part of a pre-publication release. For information on the proper use of pre-publication data shared by the Wellcome Trust Sanger Institute (including details of any publication moratoria), please see http://www.sanger.ac.uk/datasharing/ Illumina HiSeq 2500; 61 cram
EGAD00001001665 LGG Epilepsy Cohort WXS Illumina HiSeq 2000; 61 bam
EGAD00001002771 Modified alignments based on a single sample, each with different characteristics 61 bam
EGAD00010001177 This dataset contains 61 tumors SNP-array dataset from 15 EGFR mutant lung adenocarcinoma patients. Illumina 61
EGAD00001000242 EGAD00001000242_UK10K_NEURO_ASD_MGAS_REL_2012_07_05 Illumina HiSeq 2000; 60 vcf
EGAD00001000392 Agilent whole exome hybridisation capture was performed on genomic DNA derived from Chondrosarcoma cancer and matched normal DNA from the same patients. Next Generation sequencing performed on the resulting exome libraries and mapped to build 37 of the human reference genome to facilitate the identification of novel cancer genes. Now we aim to re find and validate the findings of those exome libraries using bespoke pulldown methods and sequencing the products. Illumina MiSeq; 60 bam
EGAD00001000608 PCR products were obtained from each target loci using genomic DNA from human iPS cells. Subsequently, PCR products are pooled and subjected to Illumina library preparation. The library will be sequenced either by HiSeq or MiSeq. This data is part of a pre-publication release. For information on the proper use of pre-publication data shared by the Wellcome Trust Sanger Institute (including details of any publication moratoria), please see http://www.sanger.ac.uk/datasharing/ Illumina MiSeq; 60 bam
EGAD00001000868 FFPE CPA accreditation of genome-scale sequencing in routinely collected formalin-fixed paraffin-embedded (FFPE) cancer specimens versus matched fresh-frozen samples using targeted pulldown capture prior to Illumina sequencing. Illumina HiSeq 2000; 60 cram
EGAD00001000885 Exome read sequences for 30 tumor-normal pairs for the study "Diverse modes of genomic alterations in Hepatocellular Carcinoma". Illumina HiSeq 2000; 60 fastq
EGAD00001001267 Anaplastic meningiomas are a rare, malignant variant of meningioma. At present there is no effective treatment for this cancer. The aim of the study is to identify somatic mutations in anaplastic meningiomas. We plan to sequence a set of 500 known cancer genes in 50 anaplastic meningioma and corresponding peripheral blood DNA samples. Bioinformatics will be used to analyse the results to assess the probability of these mutations being causal and so likely of critical importance for the tumour growth. Identification of these mutations will guide selection of appropriate compounds to effectively treat the disease. HiSeq X Ten; 60 cram
EGAD00001000986 Pheochromocytomas and paragangliomas (PCC/PGL) are neural crest derived tumors with a very strong genetic component. We report the first integrated genomic portrayal of a large collection of PCC/PGL. SNP array analysis revealed distinct copy-number patterns associated with genetic background. Whole-exome sequencing showed a low mutation rate of 0.3 mutations per megabase, with few recurrent somatic mutations in genes not previously associated with PCC/PGL. DNA methylation arrays and miRNA sequencing identified DNA methylation changes and miRNA expression clusters strongly associated with mRNA expression profiling. Overexpression of the miRNA cluster 182/96/183 was specific of SDHB-mutated tumors and induced invasive traits, whereas silencing of the imprinted DLK1-MEG3 miRNA cluster appeared as a potential driver in a subgroup of sporadic tumors. Altogether, the complete genomic landscape of PCC/PGL is mainly driven by distinct germline and/or somatic mutations in susceptibility genes and reveals different molecular entities, characterized by a set of unique genomic alterations. Illumina HiSeq 2000; 60 bam
EGAD00001001938 DNA from each sample (100ng) was sheared on Covaris S220 (Covaris): duty cycle - 10%, intensity -5.0, bursts per sec - 200, duration - 300 sec, mode - frequency sweeping, power - 23V, temperature -5:5 ?C to 6 ?C, water level - 13. Libraries were prepared with the TruSeq Nano DNA LT Sample Prep Kit (Illumina) using a modi?ed protocol - Sample Puri?cation Beads were replaced by Agencourt AMPure XP beads (Beckman Coultier) and size selection after the End Repair was done to remove only the short fragments. Quality and quantity for contructed libraries were assessed with DNA 7500 kit on Agilent 2100 Bioanalyzer and with Kapa Quanti?cation kit (KAPA Biosystems) on 7900HT Fast Real-Time PCR System (Applied Biosystems) according to the supplier's recommendations, respectively. Libraries from 18 barcoded samples were pooled together in equimolar amounts and each pool was loaded on a single lane of a HiSeq Single End Flowcell (Illumina), followed by cluster generation on a cBot (Illumina) and sequencing on a HiSeq 2500 (Illumina) in a single-read 50bp mode. Reads were aligned using bwa-mem v0.7.12-r1039 [10] to the 1000 genomes version of human genome build GRCh37. Picard (http://picard.sourceforge.net) was used to remove duplicate reads. Illumina HiSeq 2500; 60 bam
EGAD00001002253 Thirty cutaneous SCC WES tumour samples with matched normal include 20 samples from South et al. JID and 10 new samples. These 30 samples has been used to support the findings in the TGFb Nature Communications paper (DOI: 10.1038/ncomms12493). They are also a part of the ongoing study of cSCC genomic landscape of 40 cSCC samples in total. Illumina HiSeq 2500; 60 bam
EGAD00001002696 Recurrent breast cancer is almost universally fatal. We characterize 170 patients locally relapsed or distant metastatic cancers using massively parallel sequencing. We identify that the relapse-seeding clone disseminates late from the primary tumor. TP53 and AKT1 appear to be enriched in ER-positive cancers predisposed to relapse. Mutation acquisition continues at relapse as the same mutation signatures continue to operate and new signatures, such as that caused by radiotherapy appear de novo. In 49% of cases we identify drivers mutations private to the relapse and these are sampled from a wider range of cancer genes, including SWI-SNF complex and JAK-STAT signaling. HiSeq X Ten;ILLUMINA, Illumina HiSeq 2000;ILLUMINA 60 bam,cram
EGAD00001002726 Cluster headache is a relatively rare headache disorder, typically characterized by multiple daily, short-lasting attacks of excruciating, unilateral (peri-)orbital or temporal pain associated with autonomic symptoms and restlessness. To better understand the pathophysiology of cluster headache, we used RNA sequencing to identify differentially expressed genes and pathways in whole blood of patients with episodic (n = 19) or chronic (n = 20) cluster headache in comparison with headache-free controls (n = 20). Illumina HiSeq 4000;ILLUMINA 60
EGAD00001002883 RNAseq on Illumina HiSeq2000/2500 of Patient-derived xenograft derived from colorectal cancer sample of a validation cohort of 60 PDX Illumina HiSeq 2000;ILLUMINA 60
EGAD00001000228 EGAD00001000228_UK10K_NEURO_ASD_BIONED_REL_2012_07_05 Illumina HiSeq 2000; 59 vcf
EGAD00001000984 This is the Whole Exome Sequencing (WES) data from 59 samples from 11 patients with lung adenocarcinomas including 48 tumor samples and 11 peripheral white blood cell samples Illumina HiSeq 2000; 59 bam
EGAD00001001244 RNA-sequencing (RNA-seq) was performed with RNA extracted from fresh-frozen human tumor tissue samples. cDNA libraries were prepared from poly-A selected RNA applying the Illumina TruSeq protocol for mRNA. The libraries were then sequenced with a 2 x 100bp paired-end protocol to a minimum mean coverage of 30x of the annotated transcriptome. Illumina HiSeq 2000; 59 fastq
EGAD00001001643 RIKEN collection of WGS read of 59 multi-centric liver cancers or intra-haptatic metastasis and matched blood samples from 19 donors. Illumina HiSeq 2000; 59 fastq
EGAD00001001323 A comprehensive characterisation and analysis of human breast cancers through genome-wide approaches through transcriptomics. Illumina HiSeq 2000; 59 bam
EGAD00001002237 The disordered transcriptomes of cancer encompass direct effects of somatic mutation on transcription; co-ordinated secondary alterations in transcriptional pathways; and increased transcriptional noise. To catalogue the rules governing how somatic mutation Overall, 59% of 6980 exonic substitutions were expressed. Compared to other classes, nonsense mutations showed lower expression levels than expected with patterns characteristic of nonsense-mediated decay. 14% of 4234 genomic rearrangements caused transcriptional abnormalities, including exon skips, exon reusage, fusion transcripts and premature poly-adenylation. We found productive, stable transcription from sense-to-antisense gene fusions and gene-to-intergenic rearrangements, suggesting that these mutation classes may drive more transcriptional disruption than previously suspected. Systematic integration of transcriptome with genome data therefore reveals the rules by which transcriptional machinery interprets somatic mutation. Illumina HiSeq 2000;ILLUMINA, Illumina Genome Analyzer II;ILLUMINA 59 bam,srf
EGAD00001003531 HipSci - Bardet-Biedl Syndrome - RNA Sequencing - July 2017 Illumina HiSeq 2500 (ILLUMINA) 59
EGAD00001000092 Cancer Exome Resequencing Illumina Genome Analyzer II 58 bam
EGAD00001000145 Matched Pair Cancer Cell line Whole Genomes Illumina HiSeq 2000, Illumina HiSeq 2000; 58 bam
EGAD00001000138 The expression data for this study can be found here: http://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-1088/ and its SNP6 data can be found here: http://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-1087/ Illumina HiSeq 2000, Illumina Genome Analyzer II 58 bam,srf
EGAD00001000985 This is the targeted capture deep sequencing (TCS) data for validation of the mutations discovered in the WES step. There are 58 bam files of TCS data including 48 tumor samples and 10 peripheral blood WBC samples. Illumina HiSeq 2000; 58 bam
EGAD00001001662 Whole genome sequences of ACC primagrafts, Histone modification maps and transcription factor binding maps for ACC primagrafts and primary tumors. Processed ChIP-seq data is available on GEO under accession number GSE76465. Illumina MiSeq;, Illumina HiSeq 2000;, NextSeq 500;, Illumina HiSeq 2500; 58 bam,fastq
EGAD00001000333 Cancer is driven by mutations in the genome. We will uncover the mutations that give rise to Ewing's sarcoma, a bone tumour that largely affects children. We will use second generation Illumina massively parallel sequencing, and bespoke software, to characterise the genomes and transcriptomes of Ewing,s sarcoma tumours. Illumina HiSeq 2000; 58
EGAD00001002278 58 vcf
EGAD00001003266 For ENOC cohorts, OvCaRe cases were reviewed, including frozen material, by at least two expert gynecopathologists prior to inclusion in the sequencing cohort who provided the confirmation on final selected cohort. Frozen H&E from Tokyo were also used for evaluation along with representative H&E photos and review done at the Jikei School of Medicine. For ENOC, DAH985 and DG1288 are recurrent and both were treated with chemotherapy after their first surgery. DAH123 is a untreated sample, metastasis from an primary endometrial tumour. All HGSC, GCT, CCOC and the rest ENOC tumours are primary tumour samples. Library construction and sequencing Frozen specimens with >50% tumour cellularity (based on initial slide review) were used for cryosectioning and subsequent nucleic acid extraction. Patient tumour and normal blood samples derived from primary, untreated fresh frozen tumour specimens harvested at diagnosis during standard of care debulking surgery. Germline DNA was provided from peripheral blood buffy coat on all specimens except 13 from Tokyo, where non-cancer frozen tissue was used as a germline source. DNA extraction from both matched normal (blood) and tumour samples (frozen tissue) were performed using the QIAamp Blood and Tissue DNA kit (Qiagen) and quantified using a Qbit fluorometer and reagents (high-sensitivity assay). Three lanes of Illumina HiSeq 2500 v4 chemistry for normal samples and five lanes for tumour samples were obtained. The PCR-free protocol was adopted to eliminate the PCR-induced bias and improve coverage across the genome. Illumina HiSeq 2000;ILLUMINA 58
EGAD00001003512 This dataset includes bam files from 58 samples. These bam files include all read pairs where at least one of the reads aligns within 1kb of the HTT repeat expansion. These samples were sequenced using 2x150bp reads on an Illumina HiSeqX sequencer and aligned using bwa. Twelve of the samples used TruSeq Nano library preparation and 46 samples used TruSeq DNA PCR-free sample preparation. HiSeq X Ten;ILLUMINA 58
EGAD00001003448 strand-specific RNA-seq data from 19 gastric tumors and their adjacent normal tissues, plus 16 gastric cancer cell lines, one normal gastric cell line, and 3 normal stomach RNAs Illumina HiSeq 2500;ILLUMINA 58
EGAD00001000707 Discovery of resistance mechanisms to the BRAF inhibitor vemurafenib in metastatic BRAF mutant melanoma by massively-parallel sequencing of tumour samples. Comparison of genomic characteristics of pretreatment 'sensitive' to recurrence 'resistant' tumours to identify the genetics of drug resistance. Illumina HiSeq 2000; 57 cram
EGAD00001000739 We aim to provide a powerful reference set for genome-wide association studies (GWAS) in African populations. Our pilot study to sequence 100 individuals each from Fula, Jola, Mandinka and Wollof from the Gambia to low coverage has been completed - this first part of the main effort will make available low coverage WGS data for 400 individuals from multiple ethnic groups in Burkina Faso, Cameroon, Ghana and Tanzania. This data is part of a pre-publication release. For information on the proper use of pre-publication data shared by the Wellcome Trust Sanger Institute (including details of any publication moratoria), please see http://www.sanger.ac.uk/datasharing/ Illumina HiSeq 2000; 57 cram
EGAD00001001011 Monocyte differentiation into macrophages represents a cornerstone process for host defense. Concomitantly, immunological imprinting of either tolerance or trained immunity determines the functional fate of macrophages and susceptibility to secondary infections. Transcriptomes (RNA-Seq) and epigenomes (ChIP-Seq H3K4me1,H3K4me3,H3K27ac) in four primary cell types: monocytes, in vitro differentiated naive, tolerized and trained macrophages were characterized. Inflammatory and metabolic pathways were modulated in macrophages, including decreased inflammasome activation, and pathways functionally implicated in trained immunity were identified. Strikingly, B-glucan training elicits an exclusive epigenetic signature, revealing a complex network of enhancers and promoters. Analysis of transcription factor motifs in DNase I hypersensitive sites at cell-type specific epigenetic loci unveiled differentiation and treatment specific repertoires. Altogether, this study provides a resource to understand the epigenetic changes that underlie innate immunity in humans. Illumina HiSeq 2000;, NextSeq 500; 57 fastq
EGAD00001003345 exome sequence data for 57 HIV elite long term non-progressors and rapid progressors. Complete dataset of improved BAMs mapped to hs37d5 and including phenotype information. 57
EGAD00001002015 The use of reference DNA standards generated from cancer cell lines sequenced in the Cancer Genome Project to establish the sensitivity, specificity, accuracy and reproducibility of the WTSI GCLP sequencing pipeline Illumina HiSeq 2000; 57 cram
EGAD00001002653 Genomic DNA from leukemic and remission bone marrow mononuclear cells was isolated with the QIAamp DNA Blood Extraction Kit (Qiagen, Venlo, The Netherlands). Libraries were prepared with the Illumina TruSeq DNA Sample Prep and TruSeq Exome Enrichment Kits (Illumina, San Diego, CA, USA) according to the manufacturer's recommendations. 100 bp paired-end sequencing was performed on a HiSeq 2000 (Illumina) to about 80x coverage. 57 bam,bai
EGAD00001003211 Deep (>25x mean coverage) whole genome sequencing on 5-10 families drawn from the Scottish Family Health Study with four or more children. HiSeq X Ten;ILLUMINA 57
EGAD00001003253 Targeted gene screen of cell line tumour samples for testing the new V2 Colorectal gene panel. Illumina HiSeq 2000;ILLUMINA 57
EGAD00001001059 DATA FILES FOR SJRHB-WES Illumina HiSeq 2000; 56 bam
EGAD00001001687 Illumina HiSeq 2000; 56 bam,fastq
EGAD00001000994 HIPO blastemal Wilms (nephroblastoma) characterisation of tumor driving chromosomal aberrations Illumina HiSeq 2000;, Illumina HiSeq 2500; 56 fastq
EGAD00001001672 Part of RNA sequencing data of Malignant Lymphoma Study (ICGC) Illumina HiSeq 2000; 56 readme_file,fastq
EGAD00001003353 BAM outputs from STAR (https://github.com/alexdobin/STAR) analysis of RNASeq sequencing on HiSeq platform of 56 tumour samples from 46 melanoma cases. Gene model = Ensembl version 70 56
EGAD00010000276 SCLC tumor genotypes Illumina_2.5M 56
EGAD00001003207 Whole genome sequencing data for MMML (28 tumor/control pairs) 56
EGAD00001000437 UK10K_NEURO_ASD_TAMPERE REL-2013-04-20 Illumina HiSeq 2000; 55 vcf
EGAD00001000285 We propose to definitively characterise the somatic genetics of breast cancer through generation of comprehensive catalogues of somatic mutations in breast cancer cases by high coverage genome sequencing coupled with integrated transcriptomic and methylation analyses. Illumina Genome Analyzer II;, Illumina HiSeq 2000; 55 bam
EGAD00001001444 Atypical teratoid/rhabdoid tumor (ATRT) is one of the most common brain tumors in infants and young children. Although the prognosis of ATRT patients is poor, some patients respond very well to current treatments, suggesting inter-tumor molecular heterogeneity. To investigate this further, we genetically and epigenetically analyzed a large cohort of ATRTs (n = 170). Three distinct molecular subgroups of ATRTs, associated with differences in demographics, tumor location and type of SMARCB1 alterations, were identified using DNA-methylation or gene expression analyses. Whole genome DNA- and RNA-sequencing found no other recurrent mutations explaining the differences between subgroups. However, whole genome bisulfite-sequencing and H3K27Ac ChIP-sequencing of primary tumors revealed clear differences in methylation patterns and enhancer landscapes, leading to the identification of subgroup-specific regulatory networks. Illumina HiSeq 2000;, Illumina HiSeq 2500; 55 fastq
EGAD00001001091 We established and validated a sequence capture based NGS testing approach for PKD1. The presence of six PKD1 pseudogenes and tremendous allelic heterogeneity make molecular genetic testing of PKD1 variants challenging. In the publication accompaying this dataset (An efficient and comprehensive strategy for genetic diagnostics of polycystic kidney disease, Eisenberger et.al., PLoS one), we demonstrate that the applied standard mapping algorithm specifically aligns reads to the PKD1 locus and overcomes the complication of unspecific capture of pseudogenes. This dataset contains the raw PKD1 reads of all patients from the publication. Illumina HiSeq 1500; 55 fastq
EGAD00010000694 HCC array for cnv 55
EGAD00001001279 McGill EMC Release 4 in tissue "venous blood" for cell type "CD4-positive helper T cell" Illumina HiSeq 2500; 55 fastq
EGAD00001001900 DNA sequencing reads of human adult stem cell cultures from liver, colon and small intestine. Including biopsy or blood samples of the donors. HiSeq X Ten;, Illumina HiSeq 2500; 55 bam
EGAD00001002099 Whole-exome sequencing on Illumina HiSeq2000/2500 of colorectal cancer primary tumor sample Illumina HiSeq 2000; 55
EGAD00001003250 1cm biospies of from patients undergoing bladder cystectomy will be collected. The underlying muscle and stroma will be removed and the remaining epithelia dissected into small sequential areas which will be sent for ultra-deep exome sequencing using a panel of known cancer and viral genes. Sequence analysis using similar methods to Martincorena I et al (Science 2015, 348:880) will provide an idea of the somatic mutational landscape in these patient samples. Individual patient muscle samples will also be sequenced as a reference. Illumina HiSeq 2000;ILLUMINA 55
EGAD00001003595 This dataset consists of TLA data in the parents of 9 healthy families and 11 B-thalasemia risk families during pregnancy, cell-free DNA sequencing data and Fetal DNA sequencing where available. TLA data was collected for the CFTR region in all healthy families and the CYP21A2 region in two of the healthy families. TLA data was collected for the HBB region in the risk families. In each pregnant mother, cell-free DNA was collected, enriched for the region of interest using sureselect pulldown and sequenced. Samples are labled Mother_X, Father_X and CVS_X for the healthy families and HBB_Mother_X, HBB_Father_X and HBB_CVS_X. cfDNA files can be found under the maternal sample, and each consist of three indices used to increase the maximum number of unique molecules per SNP. Both raw and processed cfDNA data is provided, raw data is mapped using BWA MEM, sorted using samtools and restricted to the region of interest for the sake of patient privacy. Processed data is mapped using BWA MEM, sorted using samtools, duplicate filtered using samtools rmdup, overlap-clipped using picardtools and restricted to the region of interest. NextSeq 500;ILLUMINA 55
EGAD00001000287 Agilent whole exome hybridisation capture will be performed on genomic DNA derived from 25 renal cancers and matched normal DNA from the same patients. Three lanes of Illumina GA sequencing will be performed on the resulting 50 exome libraries and mapped to build 37 of the human reference genome to facilitate the identification of novel cancer genes. Illumina Genome Analyzer II; 54 bam,srf
EGAD00001000014 Agilent whole exome hybridisation capture will be performed on genomic DNA derived from 25 renal cancers and matched normal DNA from the same patients. Three lanes of Illumina GA sequencing will be performed on the resulting 50 exome libraries and mapped to build 37 of the human reference genome to facilitate the identification of novel cancer genes. Illumina Genome Analyzer II;, Illumina Genome Analyzer II 54 bam,srf
EGAD00001000232 EGAD00001000232_UK10K_NEURO_ASD_TAMPERE_REL_2012_07_05 Illumina HiSeq 2000; 54 vcf
EGAD00001000371 Sequencing data for PDAC cell lines generated by QCMG Illumina HiSeq 2000;, Illumina HiSeq 2500; 54 bam
EGAD00001000134 Sequence reads for pediatric GBM samples for manuscript: Driver mutations in histone H3.3 and chromatin remodelling genes in paediatric glioblastoma Illumina HiSeq 2000, Illumina HiSeq 2500; 54 fastq
EGAD00001000603 We recently used the Agilent SureSelect platform to re-sequence a set of genes known to be mutated in human AML. The results from 10 AML DNA samples were very satisfactory, but the effort required was significant. Thus, we decided to re-sequence the same genes using the Haloplax system for target enrichment in 48 AML samples. We planned to do this using MiSeq and have data from a pilot of 3 samples. The data is promising but coverage appears pathcy so far. However, in order to get a better understanding of the data we will need deeper sequencing. We will need two lanes of HiSeq to get the same degree coverage as Sureselect. his data is part of a pre-publication release. For information on the proper use of pre-publication data shared by the Wellcome Trust Sanger Institute (including details of any publication moratoria), please see http://www.sanger.ac.uk/datasharing/ Illumina HiSeq 2000; 54 bam,cram
EGAD00001001115 SeqControl Illumina HiSeq 2500; 54 fastq
EGAD00001001221 Illumina HiSeq 2500; 54 fastq
EGAD00010000558 SNP 6.0 arrays of small cell lung cancer Affymetrix SNP 6.0 54
EGAD00001002189 paired-end BAM files of the sequencing analysis of the mtDNA polymerase gamma (POLG) gene in the MS-affected co-twins Illumina MiSeq; 54 bam
EGAD00001003138 A dataset consisting of Multi-regional Whole Exome Sequencing (WES) and Whole Genome Sequencing (WGS) data for 54 samples from 9 patients with hepatocellular carcinoma. The dataset includes 45 tumor samples and 9 normal blood samples. Selected somatic variants were validated by Sequenom. Patients covered are: Patient 1, Patient 2, Patient 3, Patient 4, Patient 5, Patient 6, Patient 7, Patient 8, Patient 9 and Patient 10. Illumina HiSeq 2500;ILLUMINA 54
EGAD00001003586 Whole Genomes Define Concordance in Matched Primary, Xenograft, and Organoid Models of Pancreas Cancer - WGS mapped reads 54
EGAD00001003409 Amyotrophic lateral sclerosis (ALS) and frontotemporal dementia (FTD) are part of a clinical, pathological and genetic continuum. The purpose of the present study was to assess the mutation burden that is present in ALS and/or FTD known disease-causing genes in 54 patients (16 with available postmortem neuropathological diagnosis) with concurrent ALS and FTD (ALS/FTD) not-carrying the C9orf72 hexanucleotide repeat expansion, the most important genetic cause in both diseases. Illumina HiSeq 2500;ILLUMINA 54
EGAD00010000897 Infinium 450K in Rhabdomyosarcoma Infinium HumanMethylation450 BeadChip 53
EGAD00001002268 PCHiC Illumina HiSeq 2000;ILLUMINA 53 fastq
EGAD00001000609 Whole transcriptome sequencing of 28 untreated prostate cancers, 13 castration resistant prostate cancers, and 12 benign prostatic hyperplasias. Illumina HiSeq 2000; 53
EGAD00001001055 Bam files for the whole exome sequencing from the study on Spatial homogeneity in pediatric brain tumors. Illumina HiSeq 2000; 53
EGAD00001002681 RNA-seq, ChIP-seq, and ATAC-seq files for PCGP SJERG paper titled "Deregulation of DUX4 and ERG in acute lymphoblastic leukemia" Illumina HiSeq 2000 (ILLUMINA) 53
EGAD00001001293 McGill EMC Release 4 for assay "ChIP-Seq Input" Illumina HiSeq 2500; 52 fastq
EGAD00001000650 ICGC MMML-seq Data Freeze July 2013 miRNA sequencing 52 bam
EGAD00001000870 Testing logistics and infrastructure of molecular screening program. Core biopsies taken from invasive recurrent or metastatic breast cancer to evaluate and identify molecular traits rendering them suitable for clinical trials Illumina HiSeq 2500; 52 cram
EGAD00001001393 The aim of this study is to assess translational changes in macrophages over a time course of Salmonella infection. This data is part of a pre-publication release. For information on the proper use of pre-publication data shared by the Wellcome Trust Sanger Institute (including details of any publication moratoria), please see http://www.sanger.ac.uk/datasharing/ Illumina HiSeq 2000; 52 cram
EGAD00001000366 WGBS data of whole blood samples from smoking and non-smoking mothers and their children at gestation/birth and follow-up years. Illumina HiSeq 2000; 52 bam
EGAD00001003137 Metastatic and primary tumour samples were collected from 4 patients with advanced breast cancer. Samples were collected at autopsy and also from biopsies taken during life. Tumour and germline samples are available. Whole exome sequencing was performed on all samples. 52
EGAD00001003281 Genomic alterations driving tumorigenesis result from the interaction of environmental exposures and endogeneous cellular processes. With a diversity of risk factors including viral infection, carcinogenic exposures and metabolic diseases, liver cancer is an ideal model to study these interactions. Whole genome sequencing of liver tumors identified 10 mutational signatures showing distinct relationships with environmental exposures, replication and transcription. Transcription-coupled damage was specifically associated with the liver-specific signature 16 and alcohol intake. Flood of indels were identified in very highly expressed hepato-specific genes, likely resulting from replication-transcription collisions. Reconstruction of sub-clonal architecture revealed mutational signature evolution during tumor development exemplified by the vanishing of aflatoxin-B1 signature in African migrants. These findings shed new light on the natural history of liver cancers. Illumina HiSeq 2000;ILLUMINA 52
EGAD00000000119 Genotypes from cell lines derived from breast carcinoma tissue Affymetrix 6.0 51
EGAD00001000247 Integrative Oncogenomics of multiple myeloma Illumina HiSeq 2000; 51 bam
EGAD00001000610 Methylated DNA immunoprecipitation sequencing of 28 untreated prostate cancers, 11 castration resistant prostate cancers, and 12 benign prostatic hyperplasias. Illumina HiSeq 2000; 51
EGAD00010000278 SCLC matched normal genotypes Illumina_2.5M 51
EGAD00001000364 We performed low coverage whole genome sequencing of plasma DNA from prostate cancer patients to establish copy number profiles on both a genome-wide and a gene-specific level. The data include plasma samples from prostate cacner patients (n=13), non-malignant controls (males, n=10 and females, n=9), plasma samples from pregnancies with aneuploid and euploid fetuses (n=4). Furthermore, we sequenced different tumor samples (n=6) of one patients and a serial dilution of HT29 in a background of normal DNA (n=9). Illumina MiSeq; 50 fastq
EGAD00001000070 TMD_AMLK Exome Study Illumina HiSeq 2000, Illumina HiSeq 2000; 50 bam,cram
EGAD00001000119 Chordoma Exome Sequencing Illumina HiSeq 2000, Illumina Genome Analyzer II, Illumina HiSeq 2000; 50 bam
EGAD00001000168 UK10K_RARE_CILIOPATHIES REL-2012-01-13 Illumina Genome Analyzer II;, Illumina HiSeq 2000; 50 vcf
EGAD00001000635 The ETV6-RUNX1 fusion gene, found in 25% of childhood acute lymphoblastic leukemia (ALL), is acquired in utero but requires additional somatic mutations for overt leukemia. We used exome and low-coverage whole-genome sequencing to characterize the critical secondary events associated with leukemic transformation. RAG-mediated deletions emerge as the dominant mutational process, accounting for at least 43% of genomic rearrangements and characterized by the presence of recombination signal sequence motifs near the breakpoints; incorporation of non-templated sequence at the junction and a ten-fold enrichment at promoters and enhancers of genes actively transcribed in early B-lineage development. Single-cell tracking shows that this mechanism is not restricted to one founder cell but is rather active throughout leukemic evolution. Integration of point mutation and rearrangement data identifies recurrent inactivation of ATF7IP and MGA as two new tumor suppressor genes.Thus, a remarkably parsimonious mutational process transforms ETV6-RUNX1 lymphoblasts, striking promoters and enhancers of the genes that normally control B-cell differentiation. Illumina Genome Analyzer II;, Illumina HiSeq 2000; 50 bam,srf
EGAD00001000849 Illumina HiSeq 2000; 50 bam
EGAD00001001271 Around 50 samples of pre-invasive lung cancer lesions showing subsequent clinical and pathological progression or regression HiSeq X Ten; 50 cram
EGAD00001003419 Illumina HiSeq 2000;ILLUMINA 50
EGAD00001002065 Cetuximab is a targeted monoclonal antibody against the epidermal growth factor receptor (EGFR) which is used therapeutically for the treatment of KRAS wild-type colorectal cancer (CRC). The Cetuximab sensitive KRAS wild-type CRC cell line NCI-H508 has been treated with a fixed concentration of ENU for 24 hours and then selected with Cetuximab until drug resistant clones were ready to be picked and grown up as sub-clones of the parental cell line. These will have genes causally implicated in cancer sequenced to identify common point mutations in multiple independently derived drug resistant clones as a forward genetic screen for mechanisms of resistance to Cetuximab in CRC Illumina HiSeq 2500; 50 cram
EGAD00001001607 In this dataset, 16 trios- primary tumor, relapse and corresponding normals- for patients with neuroblastoma are provided. For one patient, more than one relapse was available for the analyses. Illumina HiSeq 2000; 50 bam
EGAD00010000920 samples using Illumina HUMANOMNIEXPRESS HUMANOMNIEXPRESS 50
EGAD00001002527 DEEP (German Epigenome Project) sequence data of following samples (Sequencing Types: Chip-Seq, WGBS-Seq, RNA-Seq, sncRNA-Seq, NOMe-Se, DNase-Seq): 41_Hf01_LiHe_Ct, 41_Hf02_LiHe_Ct, 41_Hf03_LiHe_Ct, 01_HepG2_LiHG_Ct1, 01_HepG2_LiHG_Ct2, 01_HepaRG_LiHR_D31, 01_HepaRG_LiHR_D32, 01_HepaRG_LiHR_D33, 43_Hm01_BlMo_Ct, 43_Hm03_BlMo_Ct, 43_Hm05_BlMo_Ct, 43_Hm03_BlMa_Ct, 43_Hm05_BlMa_Ct, 43_Hm03_BlMa_TO, 43_Hm05_BlMa_TO, 43_Hm03_BlMa_TE, 43_Hm05_BlMa_TE, 51_Hf01_BlCM_Ct, 51_Hf03_BlCM_Ct, 51_Hf04_BlCM_Ct, 51_Hf02_BlCM_Ct, 51_Hf05_BlCM_Ct, 51_Hf06_BlCM_Ct, 51_Hf06_BlCM_T1, 51_Hf06_BlCM_T2, 51_Hf03_BlEM_Ct, 51_Hf04_BlEM_Ct, 51_Hf02_BlEM_Ct, 51_Hf05_BlEM_Ct, 51_Hf06_BlEM_Ct, 51_Hf06_BlEM_T1, 51_Hf06_BlEM_T2, 51_Hf03_BlTN_Ct, 51_Hf04_BlTN_Ct, 51_Hf02_BlTN_Ct, 51_Hf05_BlTN_Ct, 51_Hf06_BlTN_Ct, 51_Hf06_BlTN_T1, 51_Hf06_BlTN_T2, 51_Hf07_BmTM4_Ct, 51_Hf08_BlTM4_Ct, 51_Hf08_BmTM4_SP1, 51_Hf08_BmTM4_SP2, 51_Hf05_BlTA_Ct, 44_Mm01_WEAd_C2, 44_Mm03_WEAd_C2, 44_Mm02_WEAd_C2, 44_Mm07_WEAd_C2, 44_Mm04_WEAd_C1, 44_Mm05_WEAd_C1 Illumina HiSeq 2000;ILLUMINA, Illumina HiSeq 2500;ILLUMINA 50 fastq
EGAD00001003115 Whole genome sequencing data of 15 French Caucasian and 10 African-Caribbean men with prostate Cancer. Illumina HiSeq 2000;ILLUMINA 50
EGAD00001003306 Exome sequencing data of 15 French Caucasian and 10 African-Caribbean men with prostate Cancer. Illumina HiSeq 2000;ILLUMINA 50
EGAD00010001281 SNP array dataset HUMANOMNIEXPRESS 50
EGAD00001003582 Genomics-Driven Precision Medicine for Advanced Pancreatic Cancer - Early Results from the COMPASS Trial - RNA-Seq unmapped reads Illumina HiSeq 2500;ILLUMINA 50
EGAD00001003584 Genomics-Driven Precision Medicine for Advanced Pancreatic Cancer - Early Results from the COMPASS Trial - RNA-Seq mapped reads 50
EGAD00001000047 exome sequence data for 49 HIV elite long term non-progressors and rapid progressors. Partial dataset (overlap with EGAD00001000087) of raw BAMs mapped to GRCh37_53. Illumina HiSeq 2000, Illumina HiSeq 2000; 49 bam,cram
EGAD00001000836 Illumina HiSeq 2000; 49 bam
EGAD00010000847 Genotyping using Affymetrix SNP6.0 49
EGAD00001001441 Despite the established role of the transcription factor MYC in cancer, little is known about the impact of a new class of transcriptional regulators, the long non-coding RNAs (lncRNAs), on the way MYC is able to influence cellular transcriptome. To this aim we have intersected RNA-sequencing data from two MYC-inducible cell lines and from a cohort of 91 mature B-cell lymphomas carrying, or not carrying, genetic variants resulting in MYC over-expression. By this approach, we identified 13 lncRNAs differentially expressed in IG-MYC-positive Burkitt lymphoma and regulated in the same direction by MYC in the model cell lines. Among them we focused on a lncRNA that we named MINCR, for MYC-Induced long Non-Coding RNA, showing a strong correlation with MYC expression in MYC-positive lymphomas and also in pancreatic ductal adenocarcinomas. To understand its cellular role we performed RNA interference (RNAi) experiments and found that MINCR knock-down is associated with a reduction in cellular viability, due to an impairment in cell cycle progression. Differential gene expression analysis following RNAi showed a strongly significant enrichment of cell cycle genes among the genes down-regulate following MINCR knock-down. Interestingly these genes are enriched in MYC binding sites in their promoters, suggesting that MINCR acts as a modulator of MYC transcriptional program. Accordingly, following MINCR knock-down, we observed a reduction in the binding of MYC to the promoters of selected cell cycle genes. Finally we provide evidences that down-regulation of AURKA, AURKB and CTD1 may explain the reduction in cellular proliferation observed upon MINCR knock-down. We therefore suggest that MINCR is a newly identified player in the MYC transcriptional network able to control the expression of cell cycle genes. Illumina HiSeq 2000;, Illumina HiSeq 2500; 49 fastq
EGAD00001001338 We propose to definitively characterise the somatic genetics of breast cancer through generation of comprehensive catalogues of somatic mutations in breast cancer cases by high coverage genome sequencing coupled with integrated transcriptomic and methylation analyses. Illumina HiSeq 2000;ILLUMINA, Illumina Genome Analyzer II;ILLUMINA 49 bam,srf
EGAD00001002739 Aligned sequence data from 14 Prostate cancer samples with BRCA2 mutations 49 bam
EGAD00001003153 Sequencing of untreated pancreatic cancer metastases and primary tumor sections. Illumina HiSeq 2000;ILLUMINA, Illumina HiSeq 2500;ILLUMINA 49
EGAD00001003362 RNAseq on Illumina HiSeq2000/2500 of Patient-derived xenograft derived from colorectal cancer primary tumor sample (EPO2_cohort) Illumina HiSeq 2000;ILLUMINA 49
EGAD00001003579 Samples prepared using Safe-SeqS technology. All samples ran on an Illumina MiSeq instrument. Fastq files for read 1 and the index read present (R and I respectively). Illumina MiSeq;ILLUMINA 49
EGAD00001003276 Whole genome sequencing data for MMML (24 tumor/control pairs), fastq-files Illumina HiSeq 2000;ILLUMINA, Illumina HiSeq 2500;ILLUMINA 49
EGAD00001000319 UK10K_NEURO_GURLING REL-2012-11-27 Illumina HiSeq 2000; 48 vcf
EGAD00001000167 UK10K_RARE_HYPERCHOL REL-2012-01-13 Illumina Genome Analyzer II;, Illumina HiSeq 2000; 48 vcf
EGAD00001000175 Identification of SPEN as a novel cancer gene and FGFR2 as a potential therapeutic target in adenoid cystic carcinoma Illumina Genome Analyzer II; 48 bam
EGAD00001000131 Genetic landscape of hepatocellular carcinoma Illumina HiSeq 2000 48 bam
EGAD00001000314 UK10K_NEURO_ASD_TAMPERE REL-2012-11-27 Illumina HiSeq 2000; 48 vcf
EGAD00001000069 Lung Rearrangement Study Illumina HiSeq 2000 48 bam
EGAD00001000421 The aim of this project is to identify rare variants in the 1q region associated with type 2 diabetes. To this end 651 case samples and 651 control samples from six populations have been pooled (pool sizes range from 27-33 individuals), and are being sequenced. The hybridization solution being used captures the exons and UTRs of genes in the 1q region. Illumina HiSeq 2000; 48 bam
EGAD00001000733 The dataset entails 48 RRBS libraries of 24 siblings. 24 individuals are conceived during the Dutch Famine, a severe 6 month famine at the end of World War 2. A same sex sibling was added as a control, allowing partial matching for (early) familial environment and genetics. Illumina Genome Analyzer IIx; 48 bam
EGAD00001000797 This project aims to study at least 90 exomes from families with congenital heart disease. The samples have been selected at the Royal, Brompton Hospital in collaboration with Stuart Cook and Piers Daubeney. Ethic approval has been sought for in the UK and a HDMMC agreement for submitting these samples is in place at the WTSI. The phenotype we wil primarily focus our analysis is severe Left Ventricular Outflow Tract Obstructions (LVOTO) and Atrioventricular Septal Defect (AVSD). The indexed Agilent whole exome pulldown libraries will be sequenced on 75bp PE HiSeq (Illumina). Illumina HiSeq 2000; 48 bam
EGAD00001000848 To evaluate the presence of mutations in frequently mutated genes in MPN by performing targeted resequencing of a selected gene panel comprising of 111 genes across 40 samples with MPN. Illumina MiSeq; 48 cram
EGAD00001000978 Multi-region whole genome sequencing of an high grade serous ovarian carcinoma sample for characterization of genomic intra-tumoural heterogeneity. Illumina HiSeq 2000; 48 bam
EGAD00001001274 Brain samples for this dataset were provided by the Medical Research Council Sudden Death Brain and Tissue Bank (Edinburgh, UK). All four individuals sampled were of European descent, neurologically normal during life and confirmed to be neuropathologically normal by a consultant neuropathologist using histology performed on sections prepared from paraffin-embedded tissue blocks. Twelve regions of the central nervous system were sampled from each individual. The regions studied were: cerebellar cortex, frontal cortex, temporal cortex, occipital cortex, hippocampus, the inferior olivary nucleus (sub-dissected from the medulla), putamen, substantia nigra, thalamus, hypothalamus, intralobular white matter and cervical spinal cord. Illumina HiSeq 2000; 48 bam
EGAD00001001229 ChIP-Seq (H3K27ac) assays for reference epigenomes generated by Centre for Epigenome Mapping Technologies at Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, Canada as part of the International Human Epigenome Consortium. Illumina HiSeq 2000;ILLUMINA 48 bam
EGAD00001001230 ChIP-Seq (H3K27me3) assays for reference epigenomes generated by Centre for Epigenome Mapping Technologies at Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, Canada as part of the International Human Epigenome Consortium. Illumina HiSeq 2000;ILLUMINA 48 bam
EGAD00001001231 ChIP-Seq (H3K36me3) assays for reference epigenomes generated by Centre for Epigenome Mapping Technologies at Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, Canada as part of the International Human Epigenome Consortium. Illumina HiSeq 2000;ILLUMINA 48 bam
EGAD00001001233 ChIP-Seq (H3K4me3) assays for reference epigenomes generated by Centre for Epigenome Mapping Technologies at Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, Canada as part of the International Human Epigenome Consortium. Illumina HiSeq 2000;ILLUMINA 48 bam
EGAD00001001232 ChIP-Seq (H3K4me1) assays for reference epigenomes generated by Centre for Epigenome Mapping Technologies at Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, Canada as part of the International Human Epigenome Consortium. Illumina HiSeq 2000;ILLUMINA 48 bam
EGAD00001001234 ChIP-Seq (H3K9me3) assays for reference epigenomes generated by Centre for Epigenome Mapping Technologies at Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, Canada as part of the International Human Epigenome Consortium. Illumina HiSeq 2000;ILLUMINA 48 bam
EGAD00001001235 ChIP-Seq (Input) assays for reference epigenomes generated by Centre for Epigenome Mapping Technologies at Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, Canada as part of the International Human Epigenome Consortium. 48 bam
EGAD00001001405 Fastq data for ChIP-Seq (H3K36me3) assays assay for reference epigenomes generated at Centre for Epigenome Mapping Technologies, Genome Sciences Center, B.C. Cancer Agency, Vancouver, Canada as part of the International Human Epigenome Consortium. Illumina HiSeq 2000;ILLUMINA, Illumina HiSeq 2500;ILLUMINA 48 fastq
EGAD00001001404 Fastq data for ChIP-Seq (H3K27me3) assays assay for reference epigenomes generated at Centre for Epigenome Mapping Technologies, Genome Sciences Center, B.C. Cancer Agency, Vancouver, Canada as part of the International Human Epigenome Consortium. Illumina HiSeq 2000;ILLUMINA, Illumina HiSeq 2500;ILLUMINA 48 fastq
EGAD00001001403 Fastq data for ChIP-Seq (H3K27ac) assays assay for reference epigenomes generated at Centre for Epigenome Mapping Technologies, Genome Sciences Center, B.C. Cancer Agency, Vancouver, Canada as part of the International Human Epigenome Consortium. Illumina HiSeq 2000;ILLUMINA, Illumina HiSeq 2500;ILLUMINA 48 fastq
EGAD00001001406 Fastq data for ChIP-Seq (H3K4me1) assays assay for reference epigenomes generated at Centre for Epigenome Mapping Technologies, Genome Sciences Center, B.C. Cancer Agency, Vancouver, Canada as part of the International Human Epigenome Consortium. Illumina HiSeq 2000;ILLUMINA, Illumina HiSeq 2500;ILLUMINA 48 fastq
EGAD00001001407 Fastq data for ChIP-Seq (H3K4me3) assays assay for reference epigenomes generated at Centre for Epigenome Mapping Technologies, Genome Sciences Center, B.C. Cancer Agency, Vancouver, Canada as part of the International Human Epigenome Consortium. Illumina HiSeq 2000;ILLUMINA, Illumina HiSeq 2500;ILLUMINA 48 fastq
EGAD00001001408 Fastq data for ChIP-Seq (H3K9me3) assays assay for reference epigenomes generated at Centre for Epigenome Mapping Technologies, Genome Sciences Center, B.C. Cancer Agency, Vancouver, Canada as part of the International Human Epigenome Consortium. Illumina HiSeq 2000;ILLUMINA, Illumina HiSeq 2500;ILLUMINA 48 fastq
EGAD00001001409 Fastq data for ChIP-Seq (Input) assays assay for reference epigenomes generated at Centre for Epigenome Mapping Technologies, Genome Sciences Center, B.C. Cancer Agency, Vancouver, Canada as part of the International Human Epigenome Consortium. Illumina HiSeq 2000;ILLUMINA, Illumina HiSeq 2500;ILLUMINA 48 fastq
EGAD00010000870 DNA methylation microarray Illumina_Infinium_HumanMethylation450 48
EGAD00001001942 We performed target re-sequencing for 1.29 Mb interval of chromosome 9 (chr9:21299764–22590271, hg19). NimbleGen SeqCap EZ choice system was used as a target enrichment method (Roche Diagnostics). A DNA probe set complementary to the target region was designed by NimbleDesign. The libraries were sequenced on the Illumina MiSeq platform with 2×150-bp paired-end module (Illumina). Fastq files for 48 Japanese patients with endometriosis are deposited. Illumina MiSeq; 48
EGAD00010001196 Raw Array data from the CPCGene BRCA study Affymetrix OncoScan FFPE Express 48
EGAD00001000180 UK10K_RARE_NEUROMUSCULAR REL-2012-01-13 Illumina HiSeq 2000; 47 vcf
EGAD00001000604 In order to progress human induced pluripotent stem cells (hiPSCs) towards the clinic, several outstanding questions must be addressed. It is possible to reprogram different somatic cell types into hiPSCs and from studies in the mouse, it appears that an epigenetic memory of the starting cell type is carried over to hiPSCs. However a comprehensive comparative study of the characteristics of these hiPSCs has been missing from the literature. Importantly studies which aimed to address these aspects of hiPSCs have used cells from different patients. In order to avoid this important confounding variable and to keep the genetic background constant, tissue samples were procured from the patients and reprogrammed to iPS cells. The transcriptomes of these iPS cells will be compared. Protocol: primary cultures of cells were reprogrammed to iPS cells. RNA was extracted using a standard column extraction kit. Illumina HiSeq 2000; 47 bam
EGAD00001000663 This study aims to re-sequence findings from whole genome studies using a bespoke pulldown method to validate mutations in those genomes sequenced. Illumina HiSeq 2000; 47 bam
EGAD00001000397 The Cardiogenics re-sequencing study will consist of three parts: Eight pools of 25 individuals will be sequenced using a Nimblegen hybrid-capture solution specific to miRNA sequences, 80 pools of 25 individuals will be sequenced using a custom Agilent SureSelect array covering genes associated with coronary artery disease (CAD) and myocardial infarction (MI), 10 individuals from families with a history of CAD/MI will be exome sequenced using the Sanger exome array. The experiment will use the early onset patients from the German MI cohort and the UK BHF CAD/MI cohort both of which have strong family history. For controls we will consider individuals from the UKBS and KORA cohorts. Illumina HiSeq 2000; 47 bam
EGAD00001001364 This dataset contains whole exome data from 8 esophageal adenocarcinoma tumors, that has been subjected to multiregion sequencing, ranging from 3-8 regions per tumor. In total, 40 tumor samples and 8 normal blood samples have been sequenced on Illumina HiSeq 2500 at a median dept of 90x. Illumina HiSeq 2500; 47 bam
EGAD00001003097 High-coverage sequencing data from 47 Yemenis samples HiSeq X Ten;ILLUMINA 47 cram
EGAD00001000165 DATA FILES FOR SJINF Illumina HiSeq 2000 46 bam
EGAD00001000178 UK10K_RARE_CHD REL-2012-01-13 Illumina Genome Analyzer II;, Illumina HiSeq 2000; 46 vcf
EGAD00001000192 UK10K_RARE_CHD REL-2012-02-22 Illumina Genome Analyzer II;, Illumina HiSeq 2000; 46 vcf
EGAD00001000355 ICGC MMML-seq Data Freeze March 2013 whole genome sequencing Illumina HiSeq 2000; 46 bam
EGAD00001000097 Matched breast cancer fusion gene study Illumina Genome Analyzer II 46 bam,srf
EGAD00001000267 This Study uses a focused bespoke bait pull down library method to target findings of Chordoma whole genome and whole exome sequencing studies in order to validate findings. This method will also be used on a larger set of tumour only samples in order to find precedence of these findings in a larger set of patient samples. Illumina HiSeq 2000; 46 bam
EGAD00001000661 Bespoke validation experiments will be performed on ER+ Breast Cancer cases to confirm the presence of mutations found in whole genome sequencing. Illumina HiSeq 2000; 46 bam
EGAD00001000662 We propose to definitively characterise the somatic genetics of Triple negative breast cancer through generation of comprehensive catalogues of somatic mutations in 500 cases by high coverage genome sequencing coupled with integrated transcriptomic and methylation analyses. This study will use a bespoke bait set to pulldown regions of interest found in whole genome sequencing to validate mutations found. Illumina HiSeq 2000; 46 bam
EGAD00001000695 DATA FILES FOR SJLGG Illumina HiSeq 2000; 46 bam
EGAD00001000402 The study will analyse by exome sequencing 42 Greek patients with premature MI and no vessel disease to identify genetic factors underlying this condition. Illumina HiSeq 2000; 46 bam
EGAD00001001113 Illumina HiSeq 2000; 46 bam
EGAD00001000440 UK10K_NEURO_GURLING REL-2013-04-20 Illumina HiSeq 2000;ILLUMINA 46 bam,vcf
EGAD00001003405 High-coverage WGS sequencing of DNA samples from 23pairs GCs was performed on the Illumina HiSeq X Ten System. Illumina HiSeq 2000;ILLUMINA 46
EGAD00001001112 Illumina HiSeq 2000; 46
EGAD00001001111 Illumina HiSeq 2000; 46
EGAD00001001110 Illumina HiSeq 2000; 46
EGAD00001001109 Illumina HiSeq 2000; 46
EGAD00001003564 The aim of the project is the definition of the molecular defect in a cohort of Rett-like patients negative for mutations in known disease genes. To this aim, a number of unrelated trios (patients plus parents) will be analysed by exome sequencing. This data is part of a pre-publication release. For information on the proper use of pre-publication data shared by the Wellcome Trust Sanger Institute (including details of any publication moratoria), please see http://www.sanger.ac.uk/datasharing/ This dataset contains all the data available for this study on 2017-08-16. Illumina HiSeq 2500;ILLUMINA 46
EGAD00001003792 The dataset for High Grade Serous Ovarian Carcinomas Originate in the Fallopian Tube includes 46 bam files from next-generation sequencing on the Illumina HiSeq2500. The samples analyzed include multiple lesions from nine patients, five with high grade serous ovarian carcinoma and four who are BRCA-carriers. Illumina HiSeq 2500;ILLUMINA 46
EGAD00001000353 DATA FILES FOR SJLGG Illumina HiSeq 2000; 45 bam
EGAD00001000719 Dataset of CageKid Normal RNA samples 45 bam
EGAD00001000764 Adrenocortical carcinomas (ACC) are aggressive cancers originating in the cortex of the adrenal glands. Despite the overall poor prognosis, ACC outcome is heterogeneous. CTNNB1 and TP53 mutations are frequent in these tumors, but the complete spectrum of genetic changes remains undefined. Exome sequencing and SNP array analysis of 45 ACC revealed recurrent alterations in known drivers (CTNNB1, TP53, CDKN2A, RB1, MEN1) and genes not previously reported to be altered in ACC (ZNRF3, DAXX, TERT and MED12), which were validated in an independent cohort of 77 ACC. The cell-surface transmembrane E3 ubiquitin ligase ZNRF36 was the gene the most frequently altered (21%), and appears as a potential novel tumor suppressor gene related to the ß-catenin pathway. Our integrated genomic analyses led to the identification of two distinct molecular subgroups with opposite outcome. The C1A group of poor outcome ACC was characterized by numerous mutations and DNA methylation alterations, whereas the C1B group with good prognosis displayed a specific deregulation of two miRNA clusters. Thus, aggressive and indolent ACC correspond to two distinct molecular entities, driven by different oncogenic alterations. Illumina HiSeq 2000; 45 bam
EGAD00001000947 Genomic libraries (500 bps) will be generated from total genomic DNA derived from Colorectal cancer patients and subjected to short paired end sequencing on the llumina platform. Paired reads will be mapped to build 37 of the human reference genome to facilitate the generation of genome wide copy number information, and the identification of novel rearranged cancer genes and gene fusions. Illumina HiSeq 2000; 45 cram
EGAD00001001620 release_2: ICGC PedBrain: RNA sequencing Illumina HiSeq 2000; 45 fastq
EGAD00001002658 Highly purified mesenchymal cells (CD45-/7AAD-/CD235a-/CD31-/CD271+/CD105+) were prospectively FACS-isolated from bone marrow specimens of 45 low-risk myelodysplastic syndrome (LRMDS) cases. Gene expression profiles (GEPs) of the 45 LRMDS have been compared to GEPs derived from likewise highly purified mesenchymal cells obtained from bone marrow specimens of healthy donors for the identification of inflammatory signatures. Additionally, an overlap in inflammatory signatures has been determined by comparing the GEPs of these 45 LRMDS cases to the GEPs of 4 Shwachman-Diamond syndrome and 3 Diamond-Blackfan anemia cases, both representing different subclasses of congenital pre-leukemia syndromes with a tendency of leukemic progression and perturbed niche compartment. Finally, the GEPs and gene expression signatures have been utilized for prognostication and the prediction of leukemic progression. Illumina HiSeq 2500;ILLUMINA 45 fastq
EGAD00010001145 HipSci - Bardet-Biedl Syndrome - Methylation Array - October 2016 Illumina 45
EGAD00001003565 The project is focused on the axonal forms of Charcot-Marie-Tooth (CMT) disease. We have selected 13 families (7 from Spain and 6 from Czech Republic) that have been indepth clinically assessed and previously tested for mutations in known CMT genes without causal variants characterised. In these patients we expect to discover several CMT2 genes. Thus, we requested for exome sequencing of 45 DNAs:27 exomes in families from Spain and 18 exomes in the families from Czech Republic. This data is part of a pre-publication release. For information on the proper use of pre-publication data shared by the Wellcome Trust Sanger Institute (including details of any publication moratoria), please see http://www.sanger.ac.uk/datasharing/ This dataset contains all the data available for this study on 2017-08-16. Illumina HiSeq 2500;ILLUMINA 45
EGAD00010001315 Single cell transcriptomics of PBMCs of 47 donors from the Lifelines Deep cohort (general population, Northern part of the Netherlands). Cells of five or six different donors were pooled together in one sample pool, resulting in eight different sample pools. In total, 28.855 cells were captured and their transcriptomes were sequenced to an average depth of 74k. Genotype data was available for each donor, which allowed us to use the Demuxlet method that uses variable SNPs between the pooled individuals to determine which cell belongs to which individual. Since genotype information is lacking of 2 individuals, the transcriptome of only 45 individuals could be retrieved. Illumina HiSeq4000 45
EGAD00001000171 UK10K_RARE_FIND REL-2012-01-13 Illumina Genome Analyzer II;, Illumina HiSeq 2000; 44 vcf
EGAD00001000162 DATA FILES FOR SJEPD Illumina HiSeq 2000 44 bam
EGAD00001000641 DNA replication errors occurring in mismatch repair (MMR) deficient cells persist as mismatch mutations and predispose to a range of tumors. Here, we sequenced the first whole-genomes from MMR-deficient endometrial tumors. Complete Genomics;, Illumina HiSeq 2000; 44 CompleteGenomics_native,bam
EGAD00001000704 Illumina HiSeq 2000; 44 bam
EGAD00001000816 ICGC medulloblastoma whole genome sequencing data, ICGC release 16 44 bam
EGAD00001000845 44 bam
EGAD00001001289 McGill EMC Release 4 for assay "Bisulfite-seq": Methylation profiling by high-throughput sequencing Illumina HiSeq 2500; 44 fastq
EGAD00001000987 Whole exome sequencing data from tumor and normal samples from carcinosarcoma (malignant mixed mullerian tumor) patients Illumina HiSeq 2000; 44
EGAD00001002676 DATA FILES FOR PCGP SJERG (WGS) Illumina HiSeq 2000;ILLUMINA 44 bam
EGAD00001000237 EGAD00001000237_UK10K_NEURO_GURLING_REL_2012_07_05 Illumina HiSeq 2000; 43 vcf
EGAD00001000046 Gastric Cancer Exome Sequencing Illumina HiSeq 2000, Illumina Genome Analyzer IIx 43 fastq
EGAD00001000007 Osteosarcoma Sequencing Illumina Genome Analyzer II;, Illumina Genome Analyzer II 43