ENA flat file validator

You can download and install the ENA stand-alone validator to validate your flat files before uploading them to ENA. The latest and previous validator jars can be found here: https://mvnrepository.com/artifact/uk.ac.ebi.ena.sequence/embl-api-validator. Source code is available here: https://github.com/enasequence/sequencetools. Java programmers can find the embl-api-core (parser, data model and writer) and embl-api-ff (validator and fixer) artifacts in the Maven Central as well.

The jar is called embl-api-validator-<version>.jar (e.g. embl-api-validator-1.1.1.jar) and is used to validate flat files from the command line. This validator is changed frequently, so please back every few weeks to see if a newer file is available. The validator will not modify your files, and will provide a series of reports to highlight any problems with your submission.

Running the validator

To run it, you need to use java 1.8. As you are running the validator outside the EBI firewall, it is not possible to run certain checks that rely on our databases. You need to run the validator with the -r option, to show that you are running it remotely, which will exclude these checks. We intend to add remote access for these checks in due course.

You run the validator passing the file name, several file names, or a wildcard to match many file names. Example commands:

  • for two files file1.txt and file2.txt
    java -jar embl-api-validator-1.1.1.jar -r file1.txt file2.txt 
  • for all files ending in *.txt
    java -jar embl-api-validator-1.1.1.jar -r *.txt

You can set the reporting level by using the -l argument, with values 0, 1 or 2 where 0 is silent, 1 is summary only, and 2 is full on-screen progress reports. E.g. for all files ending in *.txt :

java -jar embl-api-validator-1.1.1.jar -r -l 2 *.txt

Validator output

A summary output is shown on screen at the end of running. This summary allows you to quickly see if there were any errors with your files. The full error reports are written to text files, but the on-screen summary allows you to tell whether there is anything of note within them. Validation messages have three severities: info, warning and error.

  • info: no problem with data, just a comment
  • warning: data can be submitted, but you may be able to improve the quality of the information
  • error: data is invalid and must be corrected

If a message occurs many times, the message will be shown one in the "compressed massages" section. All other messages are shown below the compressed messages.

The "file summary" section shows you the results for each file run through the validator and the "summary" shows an overall count of all checked and failed entries (ignore the fixed and unchanged entry counts, these are for when running in the "fix" mode, which you will not be using).

The validator produces 5 output files:

  • VAL_INFO.txt: all info and warning level messages
  • VAL_FIXES.txt: for when used in fix mode; this can be ignored
  • VAL_ERROR.txt: all error messages
  • VAL_REPORTS.txt: some validation messages include extended textual content. For example CDS translation errors include a translation report. Permitted controlled vocabulary values can also be included here
  • VAL_SUMMARY.txt: the on-screen summary seen after running the validator is also saved into this file.

Latest ENA news

19 Jan 2018: Forthcoming changes to WGS and TSA sequences

ENA is making changes to provision of WGS and TSA sequences

05 Jan 2018: ENA release 134

Release 134 of ENA's assembled/annotated sequences is now available

21 Dec 2017: ENA services over the holiday period

Between Friday 22nd December and Tuesday 2nd January ENA services such as submissions and retrieval...

21 Dec 2017: ENA release 134 expected early January

The last release of assembled and annotated sequences for 2017 (134) has been particularly...