Improving AI-based Bioimage analysis
Summary on course page: Artificial Intelligence (AI) algorithms outperform classical image analysis methods, however, the performance of these models is highly dependent on the quality of the annotated image datasets used to train them. In this project, you will explore the application of AI for biological imaging and the relationship between model training data and model performance. You will use models stored in the BioImage Model Zoo and data in the BioImage Archive to fine-tune and aggregate AI outputs. The aim of this project will be to test, evaluate, and improve model performance on a diverse set of microscopy images and annotations within the BioImage Archive. You will learn how to apply, train, tune, and employ the most performant state-of-the-art computer vision models. This project serves as a valuable demonstration of how FAIR (Findable, Accessible, Interoperable, Reusable) data plays an essential role in the training and enhancement of AI models.
Scenario
Scenario 1:
You are interested in a biological question/research project and you have the necessary microscopy data. For the project, you need to segment the nuclei and/or cytoplasm using AI. However, you tried an off-the-shelf AI model and it didn’t segment most of the cells or gave inconsistent results. You want to improve these results or better train a model to give the best performance for your dataset. However, you don’t have ground truth segmentations for your dataset and you don’t have the expertise to provide some or you don’t want to spend many hours doing that. You choose one of the datasets in the BioImage Archive that is similar to yours and that has ground truth segmentations. You will use this dataset to train different models to give the best performance for your dataset.
Scenario 2:
You are interested in a biological question and decided you can use a dataset that is in the BioImage Archive to investigate if that’s possible. To do this you need to segment the nuclei and/or cytoplasm using AI. You first try an off-the-shelf AI model but it doesn’t segment most of the cells or gives inconsistent results. You want to improve these results or better train different models and choose the one with the best performance for your dataset.
Scenario 3:
You have or want to write your own ML model for segmentation and you want to compare its performance with other models but you don’t have any ground truth data. You choose one of the datasets in the BioImage Archive with ground truth segmentations. You will use this dataset to train different models including your own model and compare the results to evaluate. When you are happy with your model’s performance you will share it with the world by uploading it to the BioImage Model Zoo and/or Hugging face.
Scenario 4:
[You want to train a large segmentation model]
Dataset
- Nuclei +/- Cytoplasm
- 2D dataset
- Minimum 50 images
- Ground truth annotations
- AI galleries
- Accessible in the API
- An annotated fluorescence image dataset for training nuclear segmentation methods: https://www.ebi.ac.uk/biostudies/BioImages/studies/S-BIAD634
And AI Gallery view:
https://www.ebi.ac.uk/bioimage-archive/galleries/S-BIAD634-ai.html
- Confocal microscopy images of human cells: https://www.ebi.ac.uk/biostudies/BioImages/studies/S-BIAD144
And segmentation masks for S-BIAD144: https://www.ebi.ac.uk/biostudies/BioImages/studies/S-BIAD916
- Confocal microscopy and smFISH data of arabidopsis thaliana root cells: https://www.ebi.ac.uk/biostudies/bioimages/studies/S-BIAD425
And segmentation masks for S-BIAD425:
https://www.ebi.ac.uk/biostudies/bioimages/studies/S-BIAD962
- ZeroCostDL4Mic Stardist example training and test dataset: https://www.ebi.ac.uk/biostudies/bioimages/studies/S-BIAD895
- 3D cell shape of Drosophila Wing Disc: https://www.ebi.ac.uk/bioimage-archive/galleries/S-BIAD843-ai.html
- Spinning disk confocal microscopy images and segmentation masks of African green monkey kidney cell line cells transfected to express SARS-CoV-2 antigens: https://www.ebi.ac.uk/biostudies/bioimages/studies/S-BIAD1076
- Non-nuclear segmentation: https://www.ebi.ac.uk/biostudies/bioimages/studies/S-BIAD300
Project aims
Develop your project in the direction of your choice, identifying interesting biological questions and models to test. Do not worry if you do not complete all the suggested aims.
- Data ingest
- Interact with BioImage Archive API
- API to Torchvision/torch tensors
- Data examples:
- Inference
- Metrics
- Model Training
- Unet [monai]
- Cellpose [https://github.com/MouseLand/cellpose]
- BMZ
- Model selection and comparison
- Fine-tuned BMZ
- End-to-end Monai
- Upload model to Zoos
- Hugging face
- BMZ
Project mentors:
Aybuke Kupcu Yoldas | EMBL-EBI

Aybuke is a senior bioinformatician for the BioImage Archive at EMBL-EBI. Her background is on physics and astronomy, with the recent edition of biological imaging data. Prior to joining EBI she was a postdoctoral researcher at the University of Cambridge Institute of Astronomy, and before that at the European Southern Observatory in Germany. After her PhD, Aybuke’s focus has been on image analysis and data management. During that time she has worked on many astronomy projects as well as biological imaging data as part of a Cancer Research Grand Challenge project and a Covid-19 project.
Craig Russell | EMBL-EBI

Craig is a staff scientist at the European BioInformatics Institute – EMBL. His main interests are large biological imaging data analysis, cloud computing and infrastructure, and using machine learning for developing new image analysis tools. He obtained his PhD in microscope development from the University of Cambridge.