Recorded webinar
LLM generated summaries for protein classification at InterPro
This webinar will explore how Large Language Models can accelerate protein classification by automatically generating descriptive annotations for previously unannotated protein families. Traditionally, the process of curating protein family descriptions relies on manual literature review and expert knowledge, a time-consuming approach that often delays integration into biological databases. In this session, we will discuss our innovative workflow that leverages LLMs to synthesise functional summaries from existing curated data, thereby streamlining the annotation process. We will also highlight a comparative evaluation of using both a state-of-the-art GTP model and a fine-tuned local model, demonstrating that smaller, cost-effective LLMs can produce high-quality descriptions that support rapid protein classification.
Who is this course for?
This webinar is designed for bioinformaticians, computational biologists, data scientists, and researchers interested in applying AI and language models to biological problems.
This event is part of a webinar series exploring the revolutionary potential of Large Language Models (LLMs) in bioinformatics and computational biology. For details on all topics covered in this series and registration information, please visit the following link: Large Language Models and their applications in Bioinformatics
Outcomes
By the end of the webinar you will be able to:
- Identify the challenges of manual protein family curation and how LLMs offer a scalable solution
- Explore the methodology for generating automated descriptions from curated biological data.
- Evaluate the performance of different LLM approaches in producing reliable annotations
- Recognise the potential of LLMs to transform protein classification and accelerate data integration in bioinformatics.
DOI:
10.6019/TOL.InterPro-LLM-w.2025.00001.1
This webinar took place on 19 March 2025. Please click the 'Watch video' button to view the recording.