Data access policy
- General information
- ArrayExpress data access accounts
- Submitter anonymity
- Release policy
- Release date changes
- Removal of data from the public domain
1. General information
Experiments (sets of assays) and array designs in ArrayExpress are either 'public' or 'private' (often pre-publication and under peer review). Data will be kept private until a specified release date, or until an associated paper containing the experiment or array design accession number is published. Access to private data is possible via submitter and reviewer login accounts created by ArrayExpress upon successful loading of an experiment or array design into the database.
In private status, a submitter or reviewer can view the full ArrayExpress record and download all associated files. There are two exceptions:
- For sequencing experiments there is no access to fastq raw data files until the experiment is made public. The reason is that raw data files are brokered to the European Nucleotide Archive (ENA), which is part of the Sequence Read Archive (SRA), a collaboration between ENA (EBI), Genbank (NCBI), and DDBJ (Japan). There is currently no infrastructure to access privately held data files in SRA.
- Experiments tagged for double-blind peer review will hide submitter-related meta-data and block view of the full samples table, while data download remains unaffected. See submitter anonymity for details.
For sensitive data from human samples and individuals that can potentially lead to the identification of the donors (e.g. genomic DNA sequences), ArrayExpress 'private' data sets and submitter/reviewer login accounts cannot provide the high level of data encryption and access control required. In such cases, there should be a data access committee overlooking data privacy issues, and the data should be submitted to the European Genome-phenome Archive (EGA).
Human-identifiable data can still be submitted to ArrayExpress if the data has been consented for public release. Such approvals typically would be given by the relevant ethics committees and ensuring this is the responsibility of the submitters.
2. ArrayExpress data access accounts
Submitters are sent details about their own access accounts and those for journal reviewers after a pre-published experiment or array design is loaded into the ArrayExpress database. Submitters can view all their private experiments and array designs using one login. Each reviewer account only gives access to a single experiment or array design.
Please refer to this page on accessing private data on how to use the access accounts, and how to retrieve login details if you've lost them.
3. Submitter anonymity
To reduce potential bias during peer review, some journals now offer manuscript depositors the option of "double-blind" peer review, where a depositor remains anonymous to the reviewer. ArrayExpress supports "double-blind" peer review of private data sets by redacting certain meta-data fields which would inevitably reveal the submitter's identity, e.g. "Contacts" (name and email address), "Citation" (contains preliminary publication title and author list). The redaction is an optional feature at submitter's request, and only applies when a reviewer is accessing the ArrayExpress experiment of interest (detected by the reviewer's login name), so the ArrayExpress submitter (using his/her dedicated login) will continue to see the full meta-data record. Anonymity can also be lifted at submitter's request, e.g. when the manuscript is submitted to a journal which does not support "double-blind" peer review. Finally, submitter anonymity will be lifted completely once the experiment goes public.
If you are the submitter of a previously non-anonymised private data set in ArrayExpress and would like to switch on anonymity, please write to us at email@example.com.
4. Release policy
Experiments are made public when
- the specified release date is reached
- the submitter emails to tell us that the data can now be released (usually this is when an associated publication is accepted or published)
- we identify a publication in which the ArrayExpress experiment accession number is cited
Array designs are made public when
- the specified release date is reached
- the submitter emails to tell us that the data can now be released
- when an experiment the array design is linked to is made public
- we identify a publication in which the ArrayExpress array design accession number is cited
When an experiment or array design is made public the data is available through the ArrayExpress web interface and the FTP site. There are more information on this ArrayExpress FTP downloads help page.
5. Release date changes
Reminders about experiment and array design release dates are sent to submitters 7, 30 and 60 days before the current release date.
Release date can be changed using this access control tool. Simply login with the data access account details which we emailed you when curation was completed, click the calendar symbol next to the private experiment for which you would like to change the release date, and follow on-screen instructions. You would normally have used these account details to log in to the ArrayExpress website and view your private experiment(s). Don't worry if you have forgotten the username and/or password, you can retrieve login details using your submitter's email address and the experiment accession number.
6. Removal of data from the public domain
Data may be be made private again in ArrayExpress (removing both web interface and FTP access) in the following circumstances
- submitters realise after their experiment or array design is made public that they failed to request a change to the release date. Note we import data regularly from the Gene Expression Omnibus (GEO) and any data that GEO notify us as being withdrawn will also be withdrawn from ArrayExpress.
- data is found to be incorrect e.g. sample annotation or sample-data file associations are wrong. The data will be removed from ArrayExpress until the data can be corrected.
- data are found to have been submitted to ArrayExpress without the permission of the rightful owner; this is expected to be extremely rare and requires formal institutional contact with the submitting institution.
Data is made private again at the next daily update of ArrayExpress where possible. As the data will have been distributed previously as public, ArrayExpress cannot exercise any control on the resultant use of the data by third parties.
Array designs cannot be made private again if they are linked to a public experiment. Array designs will be kept private if they are linked only to private experiments.
Experiments that have been public may be included in the Expression Atlas. If an experiment is later made private again in ArrayExpress then it will also be removed from the Atlas. However, the Atlas has a monthly, rather daily, release schedule and data will be only removed at the next release unless there are extenuating circumstances.
High throughput sequencing read data that is submitted to ArrayExpress, but transferred to the European Nucleotide Archive (ENA) is subject to their data release policies. In general, ENA does not turn a public experiment private again unless there is an exceptional reason. Please also refer to our policy on updating/cancelling a sequencing experiment for more details.