Accessing data in relational databases

Database languages

Relational databases use two different types of language. Data definition language (DDL) is used to set up the database in the first place. DDL statements create, modify, and remove database objects such as tables, indexes, and users.

Data manipulation languages (DML) are used to query relational databases. Structured query language (SQL) is a commonly used example of a DML. We’re not going to teach you how to programme in SQL in this course, but it’s worth knowing about it because if you spend any time with bioinformaticians you’re likely to hear them talking about it. SQL statements are used to perform tasks such as updating data in a database, or retrieving data from a database (for example, to provide input for another database or for a workflow involving data from multiple sources).

Accessing data

How you access the data in a relational database is partially dependent on whether someone has built a user interface for it. You are accessing relational databases via a user interface every time you do an online shop, type an address into your satnav system, or search PubMed for a paper. For an individual who is searching for specific pieces of information, in a human-readable form, it makes perfect sense to search for data this way, and this is the way that we expect you will use biological databases most, if not all, of the time.

However, let’s imagine that you work in a public health lab and you’re tracking an Ebola outbreak in Western Africa. Samples from patients are being sequenced on a daily basis and you need to compare their sequences to track the source of a particularly virulent variant. Using a query language to search the sequence databases on a daily basis and retrieve the results in a form that can immediately be fed into a sequence comparison tool is going to save you a lot of time and, ultimately, could save a lot of lives.