Skip to end of metadata
Go to start of metadata

This site is no longer being maintained, for the latest documentation, please visit https://dcc-documentation.readthedocs.io/en/latest/.

Overview

This page will guide you through the process of submitting your samples to BioSamples in order to receive your sample accession IDs.  We have provided templates and tools to help validate your samples metadata.  If you have any questions about this process please contact FAANG Data Coordination Centre for help.

Prerequisites:

Please familiarise yourself with the latest sample ruleset specification and the FAANG data sharing principles.

IMPORTANT: The data validation service is not compatible with templates prepared using Libre Office Calc, please use Microsoft Excel saving as xslx or Google sheets exporting as xslx.

In order to submit samples to BioSamples there are six steps to carry out and each step will be explained in more details below:

  1. Download the Excel template

  2. Complete the template 

  3. Validate the filled out template
  4. Convert the validated Excel file
  5. Submit your SampleTab document
  6. Update existing BioSamples records

A workshop about Sample submission was held in April 2016. You can find out more about this workshop and its course materials here.  If there is sufficient demand, further workshops could be arranged in future.

1. Download the empty Excel template

Download the latest version of the Excel template, last updated 12th December 2018: 

You can also download an example template to refer to for advice on completion:

2. Completing the template


Complete the template following the instructions below on this page and referring to the latest sample ruleset specification. The rules for each attribute define if it is mandatory or optional, what sort of data is expected (numeric, date, text, etc.), what units are permitted, and whether or not an ontology term is required.


The Excel template contains a series of separate worksheet tabs to capture information about the submitter, organisms, specimens and cells.    Do not alter or delete any of the existing column headings or tabs from the file.  However it is fine to add or duplicate column headings for the purpose of capturing customized extra information and providing multiple values for one field respectively.

This guide will deal with each tab in turn:

Submission tab

Provide a submission title and submission description.

  • "Submission Title".  The suggested title format is "EBI-FAANG-HARRISON-Sheep-160414".  This is constructed from your organisation/group abbreviation, the "FAANG" project (to help identify your samples), submitters surname, common species name and date of submission.
  • "Submission Description".  Briefly describe the samples and study.  This description will be displayed on BioSamples to describe all of the samples in your BioSamples group so it is important that it is informative and accurately describes the samples in the study.  Please ensure that the common species name is included in this description.
  • IMPORTANT: Leave all other fields on the submission worksheet blank.  The "Submission Identifier" field is required for updating metadata for exisitng BioSamples records only, this is described at the bottom of the page.

Person tab

Add the full details and email address of the people responsible for producing the data.  At least one person must be listed.

For each person a role should be provided from the list of possible organization roles. A complete list is available here:

  • array manufacturer
  • biomaterial provider
  • biosequence provider
  • consortium member
  • consultant
  • curator
  • data analyst
  • data coder
  • experiment performer
  • funder
  • hardware manufacturer
  • institution
  • investigator
  • material supplier role
  • peer review quality control role
  • software manufacturer
  • submitter

A person can have more than one role by listing them twice in full, see the example of John Smith below.  At least one person should be listed as the submitter (this is you!).

Person Last NamePerson InitialsPerson First NamePerson EmailPerson Role
SmithJJohnjohn@someplace.ac.ukSubmitter
SmithJJohnjohn@someplace.ac.ukexperiment performer
BloggsLELauralaura@someplace.ac.ukdata analyst

Organisation tab

Add the full details of the organisations and funding bodies responsible for producing the data.  At least one organisation must be listed.

For each organisation a role should be provided from the same list as the person tab above.

Multiple organizations can be recorded here as appropriate, one per line.  If an organization has more than one role, then it should be listed in full on multiple lines, see example for Roslin Institute below.

Organization NameOrganization AddressOrganization URIOrganization Role
EMBL-EBIWellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdomhttp://www.ebi.ac.uk/curator
The Roslin Institute and Royal Dick School of Veterinary StudiesEaster Bush Campus, Edinburgh, Midlothian, EH25 9RG, United Kingdomhttp://www.roslin.ed.ac.uk/institution
The Roslin Institute and Royal Dick School of Veterinary StudiesEaster Bush Campus, Edinburgh, Midlothian, EH25 9RG, United Kingdomhttp://www.roslin.ed.ac.uk/biomaterial provider
BBSRCPolaris House, North Star Avenue, Swindon, Wiltshire, SN2 1UH, United Kingdomhttp://www.bbsrc.ac.uk/funder

Publication tab

The samples in the submission can be associated with publications.  As FAANG samples are expected to be submitted early in the process, this is less likely to be used in initial submissions, but publications can be added at a later date if required.  The FAANG data portal once running will incorporate automated literature mining that will link samples to publications that use the identifiers in the manuscript text. DOI is expected to be provided.

IMPORTANT: The listed publication is applied to all samples in the submission spreadsheet, if you wish to attach different publications to different samples, or only attach a publication to a subset of samples, then you will need to separate the samples into different submission Excel template documents for each publication. 

Database tab

The database tab tells BioSamples how to interpret references to external databases, this is an advanced option and can be ignored.

Term source tab

The term source tab tells BioSamples how to interpret all of the references to ontologies later in the submission, we have pre-filled it with the ontologies that you should be using.

Ontology is a way of describing objects of interest and how they relate to one another. Data annotated to ontologies can be used by computers to ask sensible questions and get back sensible answers, because they understand the meaning of the words they are looking at.  

Where the attribute should be a reference to an ontology term it will always be recorded in three parts: 1) the ontology description itself, 2) the term source the ontology referred to, using the short REF name listed in this tab and 3) the term source ID for that ontology. The description text for the ontology attribute should precisely match the ontology term. The latest sample ruleset specification records which ontologies and terms can be used for each attribute For example, for breed you can use the term 'Thoroughbred' from the Term Source REF 'LBO'  and the Term Source ID 'LBO_0000910'. In order for an ontology term to be valid, this ontology library has to be previously specified in the 'term source' tab. To help find appropriate ontology terms, we prefer to use the new version of the EMBL-EBI Ontology Lookup Service (OLS) 

If you wish to use an ontology not listed by default in the example template then it needs to be added here and you also need to contact FAANG Data Coordiantion Centre (DCC) to let us know that you have done this.

Animal / Specimen / Purified Cells / Cell culture / Cell line tabs

The recording of sample information follows the same theme so they will be described jointly.  You should also refer to the latest sample ruleset specification.

Sample Names

Sample Names should contain a short species code, an abbreviated name for the institute/lab, and a unique ID for each sample, e.g. 'ECA_ROSLIN_H1'

Accepted short species codes (please contact FAANG DCC if you are submitting a species not listed here):

    • Bubalus bubalis BBU
    • Bos taurus BTA
    • Bos indicus BIN
    • Capra hircus CHR
    • Equus caballus ECA
    • Gallus gallus GGA
    • Ovis aries OAR
    • Sus scrofa SSC

Please check the organisation/group abbreviation page to find or add your organisation/group abbreviation for consistency across institutes.

Sample description

Briefly describe the sample.

Project

Always use "FAANG" as the project so that your samples are in the correct group in BioSamples.  The project tag is important, as the data coordination centre (DCC) will use it to find your records.

Specimen material

For recording specimen material on the specimen tab please use:

MaterialTerm Source REFTerm Source ID
specimen from organismOBIOBI_0001479

Availability

This is optional for all sample types. Use this to inform people how they can get access to the sample. This can either be as a URL, or an e-mail address to contact for further information. This information will persist in the archives, so please choose e-mail addresses or web sites that will be available for a long time. These are example values for the availability attribute (non-FAANG):

• https://cells.ebisc.org/EDi001-A/

• mailto:samplesgroup@example.ac.uk

Some metadata attributes permit multiple values. For example, you may wish to describe several health status traits for an animal. In this case, insert extra columns for the attribute into the template. Take care to copy all related columns. e.g. for an animal with foot and mouth disease and endocarditis, you would use six columns:

health statusTerm Source REFTermSource IDhealth statusTerm Source REFTerm Source ID
endocarditisEFOEFO_0000465foot and mouth diseaseEFOEFO_0007277

Where there isn’t a term that is precisely right, consider using the ID for a term that is close enough, combined with text to say exactly what you mean. This will be marked with a warning at validation, but you can choose to ignore this. You can also request a new term from the ontology. Guidance on this is available on the FAANG wiki. Contact FAANG DCC if assistance is needed. As requesting a new term can sometimes take a while to resolve, you can use the best approximation first, and update with the more accurate term at a later date. You don’t have to just wait for the new term to be assigned. 

Guidelines on describing crossbred animals. 

There are additional instructions for describing crossbred animals 

Derived from / Child of 

Used to describe familial relationships between samples or what samples were derived from. For example, a tissue specimen is derived from an animal.

These must either:

    • Refer exactly to the 'Sample Name' of an appropriate sample within the same excel spreadsheet
    • Refer to an appropriate sample that already exists in the BioSamples database using its BioSample ID, e.g. 'SAMEA4447799'.

Specimen collection protocol / Purification protocol / Cell culture protocol / Culture protocol

The FAANG sample and experiment metadata specification requires your protocols to be publicly available.

It is best these are hosted in a location which will be available in the long term so locations such as lab pages are inadvisable as web addresses change and hosting goes away.

FAANG is happy to host protocols in the FAANG FTP site protocols directory

If you wish FAANG to host your protocols, please send PDF copies of your protocols to FAANG DCC. Please consult the FAANG metadata validation privacy notice for the processing of your personal information in relation to protocol submission https://www.ebi.ac.uk/data-protection/privacy-notice/faang-metadata-validation

Please name your files using this convention INSTITUTE_SOP_PROTOCOLNAME_YYYYMMDD.pdf e.g INRA_SOP_sorting_swine_CD_cells_20160504.pdf. This is a protocol for Sorting Swine CD cells, the protocol comes from the French National Institute for Agricultural Research and the protocol was written on the 4th May 2016.

Please check the organisation/group abbreviation page to find or add your organisation/group abbreviation for consistency across institutes.

If you have any questions about protocols and the form they should take, please email the FAANG Animal, Samples and Assays group

Locations

Locations should consist of a name, longitude and latitude in decimal degrees. These range from -180° (West/South) to 180° (East/North). When reporting a location, please choose an appropriate level of precision. For example, if you know the country of birth for an animal, accuracy to 1 unit is sufficient.

precisionqualitativequantitative (at equator)
1country or large region100km
0.1large city or district10km
0.01town or village1km
0.001neighbourhood, street100m
0.0001individual street, land parcel10m
0.00001individual trees1m
0.000001individual kittens100mm
0.0000001practical limit of commercial surveying10mm
0.00000001specialized surveying (e.g. tectonic plate mapping)1mm


Specific metadata instructions for tracking embryos and pregnancy

For samples taken from pregnant animals:

  • Include that the animal was 'pregnant' in the sample description.
  • The 'animal age at collection' is the age of the mother
  • The gestational age and unit should also be supplied.

For embryo samples, on the specimen page 'Tissue' should be listed as "embryo", the 'Term Source Ref' as "UBERON" and the 'Term Source ID' as "UBERON_0000922". The 'Animal age at collection' should list the embryos age in days since conception.

Adding custom columns to your spreadsheet

It is possible to add additional custom fields to your BioSamples records, simply add the columns to the end of the appropriate tab.

Please contact  FAANG DCC with your requirements as it may be something we wish to support by default in future.

If it is possible to attach an ontology term to control the nomenclature of these custom fields then 'Term Source Ref' and 'Term Source ID' columns should also be added.   If you are using an ontology not listed by default under the term source tab then it needs to be added to the term source tab and you also need to contact  FAANG DCC to let us know which ontology you have additionally used.

Missing values 

Where data cannot be included in a submission, submit one of these text values instead

  • 'not applicable'
  • 'not collected' (i.e. will always be missing)
  • 'not provided' (i.e. may be added later)
  • 'restricted access' (i.e. it isn't missing, we just can't include it in a public document) 

The use of these values will interact with the metadata validation system as follows:

  • if an attribute is required
    • not applicable, not collected, not provided - validation will regard these as an error
    • restricted access - validation will generate a warning
  • if an attribute is recommended
    • not collected, not provided - validation will generate a warning
    • restricted access, not applicable - pass
  • if an attribute is optional
    • validation will fail with any of missing values terms. As this is an optional field it should be left blank if no real data is being provided.

If an attribute is optional and you can’t supply it, you should just leave the column blank.

Assume that the DCC will ask about anything that seems implausible. e.g. ‘restricted access’ for species would be queried.

Pools of specimens

Each specimen within the pool should have its own specimen record.  For each specimen in the pool add its sample name (if detailed in the same file), or BioSample ID if it already exists in the BioSamples database, to the 'derived from' field.  Add as many 'derived from' fields as are required to record all of the specimens that are part of the pool.

3. Validation of samples

The filled out template can be validated against the FAANG rules, using the on-line tool. It will provide a report highlighting problems for review either within your web browser or downloadable for view within Excel. This is under development, so please query FAANG DCC if you have any concerns about the validation result.

Clicking on the entities tab which will show you the details of what passed and what failed. The details of errors and warnings can be found in the second column (web result) or in the comment (downloaded Excel file).

‘Errors’ are problems that have to be dealt with. You will not be able to convert the FAANG sample spreadsheet to the BioSamples sample tab format if the spreadsheet contains errors. ‘Warnings’ are items for you to review. They might be fine, but you need to decide. Any warnings left in a submission are likely to be reviewed by the FAANG DCC. You may be asked to update the sample record later if the metadata group agrees a certain value should be improved.

As well as the content validation, the report also has three summary tabs. These tabs provide summaries of the usage of 1) terms, 2) units and 3) ref+id in the spreadsheet. This aims to help discover inconsistent capitalisation, spacing or use of units, which will cause confusion when computers read the metadata. 

For descriptions and explanations of the different error messages that the validation tool can provide please see FAANG validation error message explanations.

Having run the validation tool on your spreadsheet, you will need to update it to deal with the errors shown. Review the warnings and consider making changes to deal with these. Re-validate your spreadsheet, and repeat the process until there are no errors left and you are comfortable with everything that has triggered a warning. If there are some things that you cannot resolve, contact FAANG DCC for help. The three summary tabs can be useful to give you an overview of the types of values and data that you have submitted. This is worth checking through to ensure your metadata is consistent and there are no surprising values, it can be particularly useful for large submissions. Eventually, you will have a set of metadata that passes the validation checks and is ready for conversion.

4. Conversion to SampleTab:

The spreadsheet we’ve worked with so far is not suitable for submission to BioSamples. We need to convert it to SampleTab, the BioSamples submission format. The conversion tool not only does the conversion but also does a final validation.

Resolve any error from this final validation step and the toool will generate the SampleTab for download, save it onto the disk.

For descriptions and explanations of the different error messages that the validation tool can provide please see FAANG validation error message explanations.

We are happy to help with any errors or warnings, please contact FAANG DCC.

5. Submission of SampleTab to BioSamples

To submit to BioSamples you will require an API key.  These should be issued at organisation/group level, rather than to every individual, so if your group has submitted samples to FAANG previously it should already have been issued a key.  If you need a new API key or would like to check if your organisation/group already has an API key please contact BioSamples team.

When you are ready to submit and have obtained your API key, please submit your SampleTab using this page. 

If successful, this will provide you with a new SampleTab file complete with your sample accessions.  Please keep a copy of your returned SampleTab file, as it will make future updates to your records easier.  During peak times it can take a few days for your samples to appear on the BioSamples public database

BioSamples provide extensive documentation on how to submit

6. Updating existing BioSamples records

If you wish to make an update to an existing BioSamples record to perhaps add a publication, update a protocol or to correct a mistake you can use the same SampleTab submission service described above with some additional steps described below: 

  1. For minor edits you can edit the SampleTab file created during your submission by BioSamples or for more major changes you can make a new SampleTab file from your excel template so that you can run it through validation again. 
    1. You must use the same SampleTab or template file that you used for the original submission.  If you have lost this, please contact FAANG DCC as we may be able to help reconstruct it for you.
  2. Make the required updates to your template file and run through the validation process as above if required. 
    1. IMPORTANT The process described here will update any field that differs in your SampleTab file to the BioSamples record, so you should ensure that you reuse the template form the original submission and that you do not make changes to any fields other than the ones you intend to update.
  3. You need to add the submission identifier from your original submission to the 'Submission Identifier' field in the SampleTab file or under the submission tab in your template excel file.  The submission identifier can be found on the group accession page for your BioSample record, go to BioSamples webpage and search using the BioSample ID that you wish to update.   Click the BioSample ID next to 'groups' and the submission identifier will be listed on the linked page, e.g. Submission Identifier: GSB-528.
  4. If required, convert your file to SampleTab using the on-line tool.
  5. Submit the updated SampleTab file to BioSamples using the same API key as the original submission, as only this API key that owns the original submission is allowed to update the records.  If you have lost your API key please contact BioSamples team.
  6. Your update should appear on your BioSamples record within a few days.

IMPORTANT: The process described here will update any field that differs in your SampleTab file to the BioSamples record, so you should ensure that you reuse the SampleTab or template form the original submission and that you do not make changes to any fields other than the ones you intend to update.  Any sample from the original submission that is not present in the updated template will be deleted.

Practical steps to make a change

1. Retrieve the 'old' Excel template and sampletab (correct accessions with information to be updated) file returned from BioSamples after submission

2. Update the Excel template file, do the validation and conversion

3. Submit the converted file to BioSamples test server at https://wwwdev.ebi.ac.uk/biosamples/sampletab/ using the same API key

4. A sampletab will be downloaded automatically. This sampletab contains the updated information, but with wrong accessions and will be referenced as 'new'

5. Update new sampletab file to replace the wrong accessions with correct ones

a. Release date and update date: only one place

b. Group accession: There are two fields to update: one in the form of GSB-****  (sample number + 1 times) and the other in the form of SAMEG******** (sample number times)

c. Individual record accessions: remember to update in the derive from fields as well

6. Save the new sampletab file

7. Optional: double check by comparing two sampletab files

8. Submit the new file to BioSamples

Note: these steps are for complex curations only. For simple change, e.g. fixing a typo, it is much easier to edit the 'old' sampletab file directly and make the submission

If in doubt please contact FAANG DCC for help, to revert unintentional changes or recover deletions.

Where to find help

If you have issues with or need help with BioSamples, check the archive’s help pages or email the helpdesk.

If you need guidance on preparing metadata and using the validation tools, contact FAANG DCC.

If you have suggests for new metadata requirements or changes to the existing specification, please email the FAANG metadata and data sharing working group

  • No labels