- Course overview
- Search within this course
- Let's begin with sharing our learnings and resources
- Technical and practical aspects of data management
- Towards a standardised open imaging data format: OME–Zarr
- Data storage: Public Repositories, REMBI, and Ontologies
- Working together
- Identity yourself in the data world
- Summary and acknowledgements
- Authors' affiliations
- Your feedback
- Resources and references
Scientific method and data life cycle
Data is key for the scientific method
The scientific method is used to understand natural processes. By conducting observation, hypothesis generation, experimentation, analysis, modeling and interpretation of results, scientists generate scientific facts that are then integrated into theories and laws. There are different conceptualisations of the scientific process, subdivided into specific and general steps. This subdivision depends a lot on the scientific discipline and the technological development in the field. What remains universal, however, is that at each step of the scientific method, data is generated, and the scientific method cannot be practiced without using data. This is why whenever we do science, we inherently do data management.
Data life cycle is central for data management
Traditionally, researchers have focused on following scientific methodology without considering the data journey. This was manageable when datasets were small, could be handled manually, metadata could be easily remembered and sharing data was rare. Some loss of metadata could be compensated by unstructured descriptions in the resulting research articles.
Today, the rise of big data reveals that this approach is unsustainable. How big the data is depends on the researchers’ resources and skills. When a scientific group reaches their threshold of big data, the following issues are noticed:

- Research groups and institutions are often unprepared for the scale of modern datasets.
- In some cases, data is simply stored, never used, nor published.
- Inefficient storage practices drive up energy consumption and hardware costs.
- Inadequate metadata collection creates inefficiencies in data handling and processing causing delays and errors.
- Lack of standardisation at all levels creates inconsistencies whose correction takes time and is error-prone.
- Lack of clarity about stakeholder responsibilities and communication channels leads to further inefficiencies, delays and possible data loss.
- Data management is frequently delegated to early-career researchers – a task which is often unacknowledged, for which they are often unprepared and receive little support.
As a result, significant human, financial, and energy resources are unnecessarily wasted. It is crucial to recognise that the data life cycle is part of the scientific method. Continue to the next page to learn about the data life cycle.