SampleTab file format
The BioSamples Database uses the SampleTab file format for submissions. This is a tab-delimited file that can be created in many spreadsheet editing software packages (e.g. Microsoft Excel). Although SampleTab files can have some advanced features, they can be written by users from a wide range of backgrounds using tools they are already familiar with.
The SampleTab format has a number of advanced features for power users, such as ontology mappings, anonymous groups, and UTF-8 character encoding.
- Dates and times should be in ISO 8601 format to an appropriate level of accuracy i.e. YYYY-MM-DD HH:MM:SS.ss
- US English spellings should be used, and abbreviations are discouraged.
- Filenames should end with .txt
- Cells are tab separated and only tab separated. Tabs should not be within cells, even if quoted or escaped.
- If a data cell is empty and data for that tag is optional, then it is assumed that no data is provided for that cell for that tag. If data must be provided but is missing, the file is invalid.
- Within a cell leading and trailing whitespace is stripped. Empty cells at the end of a line are also stripped. Line endings can be in any format (Windows, Linux, Mac) but Linux line endings (LF) are preferred.
- Empty lines are ignored, including lines composed only of whitespace (inc tabs)
- If a cell is started and terminated with double quotes, they may be stripped.
- If the first character on a line is #, then that line is ignored. Use of # characters elsewhere is allowed, but discouraged.
- The use of only alphanumeric characters (upper and lowercase A-Z and 0-9, no accents or other symbols) is preferred.
- Files should be UTF-8 encoded.