Child pages
  • FAANG COST Chip-Seq Training School Wageningen 27th June 2019
Skip to end of metadata
Go to start of metadata

FAANG aims to identify functional elements in animal genomes to create a rich genome to phenome resource. This will require the integration of datasets from multiple groups across the world and from a number of different species. To do this effectively requires high-quality information about the animals and samples studied, the experimental datasets generated and the analyses performed.

This workshop will guide you through a simulated FAANG samples submission and a preparation of experimental metadata, but using real a real dataset.  You will complete a FAANG metadata template spreadsheet, populate it with rich metadata descriptions, identify appropriate ontologies to describe your samples, validate it against the FAANG metadata standards, convert it to the BioSamples submission format and submit your samples to the BioSamples test site. You will also familiarise yourself with the information required to meet the FAANG experimental data standard in preparation for submitting your own datasets.

If you have any questions or need any help at any stage please ask your trainer.  This page will remain active for a few weeks so you can refer back to it later, but please note that the API key provided to you later in the workshop will only be active for this week. You will be able to apply for your own API key if your lab group does not already have one to be able to submit your real data to the test and production databases.


Resources you might need

The FAANG metadata rulesets - https://www.ebi.ac.uk/vg/faang/rule_sets/

The FAANG validation and conversion service - https://www.ebi.ac.uk/vg/faang/validate/

The ontology lookup service - https://www.ebi.ac.uk/ols/index

Zooma - https://www.ebi.ac.uk/spot/zooma/

BioSamples - https://www.ebi.ac.uk/biosamples/

The FAANG data portal - http://data.faang.org/


Workshop activities

Part 1 Sample Submission

  1. Download this partially completed FAANG metadata template.  You normally would be starting your own submission with a blank template, but for today we have completed some of the information for you in the interest of time, this dataset was kingly shared by Wageningen University. FAANG_CONST_training_partial_20190627.xlsx

  2. Open the downloaded file in Microsoft Excel (or use google sheets if this is not available to you).  Navigate through the tabs and get a sense of where the different data from the FAANG metadata rulesets (https://www.ebi.ac.uk/vg/faang/rule_sets/) is recorded. The template has separate tabs for recording the organisms, specimens and cell cultures. The rulesets are also available in GitHub in JSON format https://github.com/FAANG/faang-metadata/tree/master/rulesets.

  3. In the partially completed metadata template file, add your own contact information on the person and organisation tabs. Can you work out what to put for the role columns? (Hint: These need to be one of the ontology text names from https://www.ebi.ac.uk/ols/ontologies/efo/terms?iri=http%3A%2F%2Fwww.ebi.ac.uk%2Fefo%2FEFO_0002012 For example your person role could be "submitter" or "investigator" and your organisation role could be "institution"). When you make your own submissions in the future you can list additional people (PIs and collaborators) on subsequent lines.

  4. You will notice on the specimen tab that the ontologies are missing for "organism part". Use the Ontology Lookup Service (https://www.ebi.ac.uk/ols/index) to identify appropriate ontologies for the missing liver and spleen values (e.g. what is the ontology term for liver?). Once you find an appropriate ontology you will need to record both the Term Source Ref and Term Source ID in the spreadsheet. Look at some of the other columns in the spreadsheet for the appropriate format. You will likely receive multiple different results from different ontologies, look in the FAANG sample ruleset for which ontology we prefer to use for this field. Now that you have found the first couple of organism part fields for speed you should also try using the Zooma tool for a few of these fields. This tool allows you to query multiple ontologies at once by inputing a different text term on each line. This will suggest what it thinks to be the most appropriate term in each case and give a score in its confidence. (Hint: you can download the results from zooma in tsv so you can quickly copy them into your template).

  5. Once you have identified an appropriate ontology for each of the missing organism parts, lets check how we are doing against meeting the FAANG rules. Validate your spreadsheet using the FAANG validation service (https://www.ebi.ac.uk/vg/faang/validate/). Make sure to select that you are using the "FAANG Samples" ruleset and the "BioSample .xlsx" file format.

  6. The results will show that you have more errors in the spreadsheet that need to be fixed before you can continue, have a look at what the different tabs tell you. If you mouse over the errors it will give you more information on what the error is.  Work through each error to see if you can work out how to resolve them and make the required changes to your spreadsheet. You can check your spreadsheet against the validation service each time to check you have solved the error.

  7. Validate your spreadsheet using the FAANG validation service until you have fully passed validation or have warnings that you feel can be safely ignored.

  8. You can now convert your spreadsheet into the format required for submission to BioSamples.  Use the FAANG conversion service (https://www.ebi.ac.uk/vg/faang/convert) to create a .SampTab.tsv file. Have a look at this file, you can see all of your data and some additional header information. (For the computationally minded, BioSamples also now accepts JSON and this convertor will switch to producing JSON later this year)

  9. Submitting data to BioSamples requires a unique key that identifies you as a submitter and gives you permission to make submissions. For todays workshop get your API key from your workshop trainer, there should be a spreadsheet with all the keys (https://docs.google.com/spreadsheets/d/1UrivtesFwFiW40l38dfYDd_p3a7XlnzxdAZasBlXjbY/edit?usp=sharing).  This key is only valid for the test (not the real) service and will only be valid for this week. For real submissions you will use an existing key that your lab group already owns (ask your colleagues) or you can request a new key from the BioSamples team at EMBL-EBI.

  10. Navigate to http://wwwdev.ebi.ac.uk/biosamples/sampletab/submission and provide your API key. Note the 'wwwdev' in the web address that indicates you are using the test service, make sure you are using this service rather than the real production service (www.). Your key for today will not work on the live production version of BioSamples.

  11. Type in your API key.

  12. Select your tsv file that was generated by the FAANG conversion service for upload to BioSamples.

  13. If successful this will return an annotated SampleTab file to you as a download, have a look at this file and compare it to the one you submitted, you will see that you have been assigned a unique BioSample accession for each of your records. Your data is now in the test archive.

  14. It can take a few minutes for your samples to be processed into the BioSamples database so you can always grab a coffee while you wait or ask your trainer any questions you might have now that you have used the service.

  15. After a few minutes navigate to http://wwwdev.ebi.ac.uk/biosamples/samples/<insert one of your BioSample accessions here> and you should see all of your data (as this is the test server they will be deleted at the end of the week).

  16. Use the relationship links at the bottom of the BioSamples record to navigate between the linked BioSamples, noting which is the animal and which is the specimen record. When you make submissions to the real production version of BioSamples then the data will appear in the FAANG data portal the following day.


Part 2 Experiment submission

  1. Download this example experiment template file FAANG_COST_training_experiment_examples_20190627.xlsx that contains data from the Roslin Institute and have a think about how you might record some of the information from your own conducted or planned experiments. There is an empty template file if you wanted to have a go at this in more detail and start collecting the terms you will need for your own submission faang_experiment_metadata_empty_template_20181129.xlsx (or google sheets https://drive.google.com/open?id=1dK30N-_nTVZ56768_F_M0NoWpsKRvJUpwG4aWLRgzL8)

  2. Using the FAANG_COST_training_experiment_examples_20190627.xlsx use the FAANG validation service (https://www.ebi.ac.uk/vg/faang/validate/) this time against the experiment rulesets and selecting the format as SRA Experiment XLSX.  This may take some time on a slow internet connection or with large files.  Have a look at the different tabs showing the detailed information provided and the ontologies used.

  3. You can now convert your spreadsheet into the format required for submission to ENA using the conversion service https://www.ebi.ac.uk/vg/faang/convert. This may take some time on a slow internet connection or with large files. Have a look at the files it generates. You can see this is a different format to the BioSamples file, which is why we are planning to unify everything in a new unified submissions interface. This metadata XML would be submitted along with the data files to the European Nucleotide Archive, we don't have a test version available for you today but full instructions on this last step are available in the FAANG Archive Submission guidelines for when you are ready to do this with real data.

  4. Finally navigate to the FAANG data portal, where all FAANG data from the public archives will be indexed.  See if you can find the real versions of the sample records that you have been doing test submissions on today (Hint: Used 'Organisms' or 'Specimens' view to look for key metadata values that match the data, try to use filters).
    1. Follow the links within the sample records to navigate to the BioSamples Database to view the full records there.
    2. Have a look at the sampling collection protocol for one of these records. You will have to create detailed protocols like these as part of your FAANG submission. They are stored on the FAANG FTP server to ensure they are available in the long term.
    3. Can you find any RNA-Seq data files from pigs in the data portal.


Ask the trainer any remaining questions you might have or for specific help in selecting ontologies for your own real datasets.

Going forward

You should hopefully now be better prepared to meet the FAANG metadata standards for future data that you generate and to provide this to FAANG through the public archives.  You can refer to the full FAANG Archive Submission guidelines for further information on complete submissions.

You can also contact the FAANG Data Coordination Centre faang-dcc@ebi.ac.uk if you require any advice or assistance with your own submissions.

Wellcome Genome Campus Conference Centre

  • No labels