Many bioinformatics databases - and indeed, many large, complex databases in all walks of life - are relational databases. Put simply, a relational database comprises two or more separate tables, with explicitly defined relationships linking the tables together via key fields.
The advantages of relational databases over flat files include the following:
- Data integrity: it’s less easy to corrupt the data in relational databases because, owing to their structure, flat files frequently contain redundant information, which corrupts more easily.
- Data consistency: each ‘entry’ in a relational database has a unique ID; this reduces the chances of having inconsistent data and multiple entries for the same item.
- Smaller file sizes for the same data: this lack of redundancy makes relational databases more compact than their flat file equivalents.
- Data availability: databases can be shared over a network and updated from many points.
- Speed: retrieval of information is typically faster from relational databases than it is from flat files containing the same data.
In a nutshell, relational databases provide efficient, reliable, convenient and safe multi-user storage of, and access to, massive amounts of persistent data. For a lengthier discussion on this topic, see Jennifer Widom’s Introductory video to relational databases, below.