Food security and bioinformatics

Food security and bioinformatics: Illustration by Spencer Phillips, EMBL-EBI

Food security and bioinformatics

21 Nov 2017 - 16:14

To grow crops that stand up to the challenges of climate change, including drought and flooding, humans need to develop new varieties of plants. Genomics has enormous potential to advance plant science, and in turn help ensure the global security of food supplies. To enable research in this critical area, EMBL-EBI engages with all aspects of the plant, pollinator, pathogen and pest research communities.

Genomics: biomedicine and agriculture

As genomic sequencing has become more portable and accessible over the past two decades, it has become more widely used, especially in the biomedical domain. Progress in interpreting plant genomes has been slower, mainly because of their size and complexity.

But with recent advances in technology, and new methods for analysis, researchers are finally able to decipher the genomes of important crop plants like wheat. They are also able to characterise ‘natural variation’, which crop breeders use as their raw material.

DNA sequencing and other ‘high-throughput’ technologies are also enabling researchers to measure the traits (phenotypes) of plants on a large scale. Linking this information to genetic variation data makes it possible to identify genes that would be desirable in new crop lines.

Yet to many researchers, this information is still inaccessible. It can be hard to discover what kind of data is out there, where to find it. At the same time, the diversity of phenotypic measurements and experimental designs makes it difficult to match up the corresponding genetic information. That diversity also makes it hard to build on these datasets by adding the results of subsequent studies.

The agri-food research community needs a new informatics ecosystem that addresses both technical and social challenges.

To solve these problems, the agri-food research community will need a new informatics ecosystem that addresses both technical challenges (how do we collect, structure and interpret the data?) and broader social ones (how do we persuade those who generate the data to share it?). Together with its collaborators at the Earlham Institute, Rothamsted Research, INRA and other leading research organisations, EMBL-EBI is at the forefront of these efforts.

EMBL-EBI: the hub of molecular information

As part of EMBL and an international, publicly funded ‘data steward’ for the life sciences, EMBL-EBI:
  • manages data resources that researchers use to study plants and their pathogens, pollinators and pests;
  • builds software and tools for finding, reusing and sharing data from plant and pathogen experiments;
  • adds value to plant genome data by integrating and interpreting the results from multiple experiments;
  • participates actively in setting standards for many different data types so that different datasets may be combined and studied in new ways over time.


Reduced diversity, increased risk

Modern crops have been developed through the domestication of wild species. Selective breeding for desirable traits has produced ‘elite’ varieties that have astonishing yields compared with their predecessors of a few decades, let alone a few hundred years ago.

Yet the very process that selects for advantageous traits also removes genetic variation from the population – a process known as ‘genetic sweeping’. The consequence is that domesticated plants may show a reduced ability to adapt to new environmental conditions, while continued selection can leave a shrinking pool of elite material from which further varieties can be bred.

The very process that selects for advantageous traits also removes genetic variation from the population…”

“In terms of diversity, we can draw a parallel between plant domestication and human migration,” says Paul Kersey, whose team runs the plant genomics resources at EMBL-EBI. “When human populations migrated out of Africa, those who left took with them only a fraction of the diversity of the entire African population. Similarly, in plants, the act of domestication was itself a bottleneck.”

Plants, pollinators and pests: data resources

  • Ensembl Genomes: A detailed knowledge base with genome-scale data and information about plants, bacteria, fungi, protists and metazoa.
  • WormBase ParaSite: Focused on draft genomes of parasitic nematodes. Currently includes six plant parasites.
  • PhytoPath: Genome-scale plant pathogen data and associated phenotype information, focused on microorganisms.

Fragile elites

The current elite varieties that make up our staple crops are highly optimised but fragile, like performance sports cars. They have been designed to do well in the environments where people planned to grow them, so if conditions change, they can be very vulnerable (e.g. to disease or climate change).

“If you look at the period before domestication, there was truly enormous diversity in the ancestors of crop species compared to their descendants,” says Kersey. “But if we went back to eating these pre-domestication foods, 90% of the planet’s human population would starve to death. The foods we eat now are extraordinarily different from those we ate before domestication. Without selection, the human population could never have grown to its current size.”

The elite varieties that make up our staple crops are highly optimised but fragile, like performance sports cars.”

“For example, the original corn-on-the-cob species had a tiny head that could produce only a small amount of food per plant. Worse, some plants, such as wild potatoes, are tiny and poisonous – you actually need to soak them in water for three days before they become edible,” adds Dan Bolser, Ensembl Plants Project Leader at EMBL-EBI. “It’s the thousands of years of breeding started by Native American populations (and continued by more modern breeders) that have allowed us to produce our current crop.”

Figuring out how domestication has altered the biology of wild species requires some sophisticated detective work. It means picking apart evolutionary processes to understand the pressures domestication has put on different species.

Armed with that knowledge, you can start to develop a picture of how to re-introduce lost diversity into crop species – an important step towards securing the food supply.

How do plant breeders improve crop varieties?

Genetic diversity itself is the raw material researchers use to improve crop species. They find variants of genes in individual plants or species that can confer desirable characteristics (e.g. pest resistance, higher nutritional value) on commercial crop species. Commercial varieties already contain a complex mixture of desirable genes, so the goal is to introduce specific new variants without disrupting the wider genetic background.

Classically, breeders identify a gene responsible for a desirable trait, and introduce it into the elite lines via a process of repeated ‘back-crossing’:

  • First, a variety containing the new variant is mated (crossed) with an elite variety.
  • Then, offspring with the desirable trait are selected.
  • These are crossed again with the elite parent, and so on.
  • Eventually, the breeder isolates individual plants in which the new variant is present in (essentially) the same genetic background as the original elite line.

How plant cross-breeding works

How "back crossing" helps improve crops, simplified. Illustration by Spencer Phillips at EMBL-EBI.


Plant breeders can accelerate the process of cross breeding if they know the sequence of the ‘desirable’ gene. They can use molecular methods to identify the presence of the gene in the latest offspring, without waiting for the plant to grow so they can observe the presence of the trait.

This saves a considerable amount of time, but the process is still a lengthy one. For example, the International Rice Research Institute in the Philippines characterised and isolated the gene SUB1, responsible for flood-tolerant traits in a local variety of rice. It then took about a decade to breed this gene into the high-yielding line.

The use of this new strain, which combined flood-tolerance with traits already present in the previous elite lines, such as good yield and pest resistance, made a huge difference for the flood-prone regions of Asia where rice is grown.

Can we speed up this process in future?

CRISPR, a recently developed technology that allows very precise genome editing, could enable researchers to insert a desirable gene directly into the elite line, bypassing the many steps of ‘back-crossing’. Traditional breeding is, in effect, merely a slower form of genetic modification. But in many countries, public fears about new techniques for genetic modification have led to restrictions on the use of faster methods in crops.

Regardless of the techniques used within crop-improvement programmes, technologies that facilitate the large-scale sequencing of elite crops, traditional varieties and their wild relatives are potentially transformative. It becomes possible to construct detailed maps of genomic and phenotypic diversity, which allow scientists to understand plant variation better and to pinpoint the variants responsible for particular traits.

Data for design

Using data resources like Ensembl Plants, researchers can identify relevant genetic variation in plant species. Through statistical analysis, they can work out which variants that are not present in the elite lines are potentially useful, and use that information to choose the ideal parents for introducing these variants into a favourable genetic background – a process known as “precision breeding”.

“To design a good strategy for breeding a new crop variety, you need all the high-quality information you can get.”

Kersey explains: “To design the best strategy for breeding a new crop variety, you need all the information you can get. If more data on genetic variation can be accessed, and more strains have been characterised genetically and phenotypically, the more precisely you can associate a trait with its causative gene, and the wider a pool of potential parents you have to use in the breeding process."

“Public seed banks are particularly important here – they contain a huge collection of seeds from traditional varieties and wild relatives of crops. The more data we have about these seeds, the easier it will be to exploit them to improve agricultural yields.”

Better wheat

The genomes of many plants are much bigger than the human genome. For example, the genome of barley is around 60% larger than that of humans. “Polyploidisation”, a process whereby two related species cross and the offspring contains the full genomes of both parents, can add another level of complexity and is commonly seen in staple crops. The bread-wheat genome can be described as three barley-like genomes wrapped up in a single organism. It is about five times the size of a human genome.

Bread wheat: three genomes combined.

The bread wheat genome: three genomes combined.


And it’s not just that a larger genome costs more money to sequence. These genomes are so different from those of humans and other model organisms that the usual methods are not always up to the task of deciphering them.

Genomes are sequenced in pieces, which are then pieced together and annotated (i.e. labelled). But what makes many cereal genomes so big is the activity of transposons – genetic elements that can ‘jump around’ the genome and alter a cell’s genetic identity. The resulting repetitive patterns are extremely difficult to untangle, disambiguate and interpret.

New methods have emerged that have made this task more manageable. A new version of the barley genome featured on a recent cover of Nature, and the wheat reference genome has also undergone dramatic recent improvement. As these new datasets are published, Kersey’s team analyses them and incorporates them within Ensembl Plants.

Genomes in Ensembl are available for everyone to use, without restriction or charge.

The team links different versions of reference genomes so that researchers working in many fields can continue their work as new knowledge is added. Like most EMBL-EBI data resources, these genomes are available for everyone to use, without restriction or charge.

High standards

To link the traits of a plant to specific genetic variants, one needs to measure those traits consistently – a process known as phenotyping. Data from phenotyping experiments is complex and diverse.

For example, a study of drought resistance in plants could include several measures of evaporation, from a microscopic count of the number of pores on a leaf all the way up to aerial photography of whole fields.

EMBL-EBI is helping the community agree on standards to describe these diverse datasets. Kersey is on the steering committee of a major standards initiative called MIAPPE: Minimum Information About a Plant Phenotyping Experiment. MIAPPE is building community consensus on exactly what information (metadata) is necessary to describe plant-phenotyping experiments.

Everything in its right place

Data sharing is not yet a standard working practice for plant phenotyping, partly because there are few public repositories where data could be kept.

“A plant breeder looking for any particular trait should be able to look up a public dictionary of available genetic variation, and see the traits that have been measured on public stocks that carry these variants. This sort of information is now accessible for variants causing human disease – it also needs to be available for researchers working on crops,” says Kersey.

Designing Future Wheat

Since the original publication of the reference bread-wheat genome in 2015, crop genomics has accelerated. Over the next year, the whole genomes of at least ten varieties of wheat are likely to be assembled and shared openly. This is a big jump forward, and opens up possibilities to work with cereal data in many new ways.

A new research programme focused on wheat, funded by the Biotechnology and Biological Sciences Research Council (BBSRC), started in 2017. The BBSRC is funding EMBL-EBI and a number of leading UK institutions: the John Innes Centre, Rothamsted Research, the University of Bristol, the University of Nottingham, the Earlham Institute and the National Institute of Agricultural Botany, to build a scientific knowledge base to support the design of future wheat varieties.

Seed banks as data-rich resources

EMBL-EBI also participates in DivSeek, an initiative linking seed banks with organisations interested in plant informatics. Seed banks around the world are in the process of extending genomic analysis from a handful of reference cultivars to thousands of seed stocks. DivSeek partners are developing a consistent, reliable data infrastructure that will make it easier to store and re-use this precious information. Participants include the centres of the Consultative Group on International Agricultural Research (CGIAR), national repositories, and botanic gardens.

“There are very exciting opportunities in turning physical seed repositories into information repositories...”

One critical part of this process is connecting the physical material, its descriptions and the results of the experiments performed on it – such as DNA sequence information stored in public data archives. The first step is to identify the material clearly and consistently.

“There are very exciting opportunities in turning physical seed repositories into information repositories that can be used directly by scientists and breeders all over the world. It is a huge challenge, and one we hope to play a part in,” says Kersey. “At the same time, there are legitimate concerns about how the benefits of genomic knowledge will be shared with the developing nations from which many of these seeds have been collected. It’s important we address these effectively to ensure these resources continue to be available – and accessible – to benefit the entire global community.”

Pooling knowledge

Just as genomics is playing an increasingly central role in human health, it can help us mitigate some of the effects of climate change and make our food production more sustainable. But new discoveries will depend on pooling knowledge from many sources. An open culture of data sharing and collaboration will be critical for achieving this potential.

Discover more

Further reading

Explore the data


This article is based on an interview with Paul Kersey and Dan Bolser. Thanks to our many collaborators and funders who make this work possible.

Contact the news team

Vicky Hatch | Communications Officer

Oana Stroe | Senior Communications Officer

Subscribe to the email newsletter

Subscribe to our publications.

Sign up Or stay updated with the RSS feed (EMBL-EBI only).