Workshop Descriptions | Biocuration 2013

Galaxy workshop: Monday 3-5pm

Chairs: Suzanna Lewis and Dave Clements

Galaxy is a data integration and analysis platform for the life sciences. GO Galaxy (http://galaxy.berkeleybop.org/) is free, Galaxy-based, public web server that provides a simple integrated environment in which ontological analysis tools can be linked into workflows.The GO Tools page lists more than 50 tools for doing GO-based analyses but these are not well integrated. AmiGO/GOOSE offers some functionality such as slimming, enrichment, data extraction but these are difficult to chain together. The GO Galaxy server addresses these needs.This workshop will introduce Galaxy and then demonstrate how to use Galaxy to do

Basic genomic analysis, and
Enrichment analysis using the GO Galaxy Server.

The motivating GO Galaxy application is enrichment analysis: Given a set of genes what biological process(es) is this set enriched for? A proposed standard for representing the results of term enrichment analyses is proposed so that the results from alternative tools can be directly compared.

Participants will gain specific knowledge about how to use Galaxy to perform these types of analysis, and repeat, reuse, share, and publish their analysis. This workshop is geared towards biologists. No programming or command line knowledge is assumed or required. A basic knowledge of ontology principles will be helpful.

On Sequence Similarity and using Web Services for biocuration (2 presentations of 1 hour each): Monday 3-5pm

Chairs: Dr Willaim Pearson, (UVA) Rodrigo Lopez and Hamish McWilliam (EBI)

Sequence similarity searching is the most powerful tool to infer evolutionary relationships between genes; assign function to novel genes and ultimately annotate entire genomes and their proteomes. Dr. William Pearson will be talking about sequence similarity searching from the perspective of gene products and describing search results beyond the humble sequence alignments. His talk is entitled "Looking at Proteins, Using More Than Sequences".

Access to databases and analytical tools using Web Services perfectly suits the biological data curation process. These allow for the seamless integration of large databases and complex analytical tools into the curation workbench. Rodrigo Lopez and Hamish McWilliam will be talking about the technology and tools available to the biocuration process. They will present examples of existing curation pipelines that use these services and invite the participant to participate and actively influence their development.

1. Looking at Proteins, Using More Than Sequences: Dr. William Pearson, UVA.
2. EBI Web Services for biocuration: Rodrigo Lopez/Hamish McWilliam, EBI.

Variation annotation workshop LOVD : Monday 3-5pm

Chairs: Dr Raymond Dalgleish (Univ Leicester) and Peter Taschner (Leiden Univ Medical Centre)

The Leiden Open Variation Database system (LOVD: http://lovd.nl/) is the leading solution for the online gene-centric collection and display of DNA variation. This variation is often associated with inherited diseases and LOVD databases have been created for every disease gene which is described in Online Mendelian Inheritance in Man (OMIM: http://www.ncbi.nlm.nih.gov/omim). Many of these databases are actively curated, but others are in need of enthusiastic curators to adopt them and develop their data content. We will present an introduction to LOVD and explain the tasks involved discovering and curating data for entry into the database, as well as covering aspects of variant description including reporting standards and reference sequences.

Emerging standards for genome annotation and curation in the era of high throughput sequencing: Tuesday 3-5pm

Chair: Kim Pruitt (NCBI)

Next-generation sequencing has enabled researchers to perform genomic and transcriptomic sequencing at rates that were unimaginable in the past. Microbial genomes can be now sequenced in a matter of hours, which leads to a significant increase in the number of assembled genomes being deposited in the public archives. This huge increase in DNA sequence data presents new challenges for the submission, annotation and analysis pipelines. New standards for the submission, validation, analysis, and curation of genome data must be developed for both reference genomes and population studies derived from clinical outbreaks. This workshop will provide an overview of the interplay between computational process, tools, and curation activities and how this combined approach is poised to deal with the data onslaught while continuing to maintain data integrity within NCBI resources. Information will provided about efforts underway to streamline data submission, improve annotation pipelines, automate validation and provide analysis tools for data visualization across the multitude of genomes. Talks and discussions will clarify the important role of curation within these process flows and how that helps to improve data quality.

The participants of the workshop will benefit from understanding NCBI process flows and curation protocols, as well as the standards and policies for data quality assurance developed by NCBI in conjunction with the genome community.

Topics and speakers

1 Data Submission processing: using computational and curation approaches to streamline submissions and validation and improve public data – speaker: Ilene Mizrachi

2 Using NGS in Eukaryotic genome projects: how curation activities help improve outcomes of the eukaryotic genome annotation pipeline, sequence variation, and browser resources – speaker: Kim Pruitt

3 Managing High through-put Prokaryotic genome projects: streamlining genome annotation and curation activities to manage bacterial reference genomes and sequence variation – speaker: Tatiana Tatusova

BioCreative Text Mining Workshop for Biocuration: Tuesday 3-5pm

Presenters: Cecilia Arighi¹, Kevin Cohen², Martin Krallinger³, and Zhiyong Lu⁴

¹ Center for Bioinformatics and Computational Biology, University of Delaware, DE, USA

²Computational Bioscience Program, University of Colorado School of Medicine, Aurora, CO, USA.

³Structural Biology and Biocomputing Group, Spanish National Cancer Research Centre, Madrid, Spain

⁴ National Center for Biotechnology Information, NIH, Bethesda, MD, USA

BioCreative: Critical Assessment of Information Extraction in Biology is an international community-wide effort that evaluates text mining and information extraction systems applied to the biological domain (http://www.biocreative.org/). A unique characteristic of this effort is its collaborative and interdisciplinary nature, as it brings together experts from various fields, including text mining, biocuration, publishing houses and bioinformatics. This allows to discuss during the accompanying BioCreative Workshops how to drive the development of text mining systems that can be integrated into the biocuration workflow and the knowledge discovery process. To address the current barriers in using text mining in biology, BioCreative has further been conducting user requirements analysis, user-based evaluations and fostering standards development for text mining tool re-use and integration.

This workshop will present several text-mining research topics addressed by the BioCreative efforts that are of particular relevance for literature curation. These topics include the extraction of bio-entity annotations using standard bio-ontologies (i.e. Gene Ontology annotation), the identification of bio-entities relevant for curation (i.e. chemical compounds and drugs), and aspects dealing with text mining systems’ utility/usability and interoperability.

The aim of this workshop is to encourage active involvement of biocurators in guiding text mining system development and adoption by demonstrating and discussing past and current efforts of the BioCreative challenges. Participation in this workshop will give biocurators the possibility to learn more about current text mining efforts useful in literature curation and will enable them to provide direct feedback to the text mining experts.

The intended audience includes both biocurators that do literature curation and developers involved in biocuration workflows. For more details and workshop agenda please go to http://www.biocreative.org/events/bcbiocuration2013/biocreative-text-mining-worksh/

Biocuration and scholarly communication cycle: roles and opportunities for biocurators: Tuesday 3-5pm

Chairs : Susanna Sansone (Oxford) and Carsten Kettner (Beilstein Institut)

The last few years have been marked by the arrival of data articles and data journals set to enhance and support sharing, reproducibility and reuse of research data underlying peer-reviewed publications.
The aim of this workshop is to explore potential synergies between biocurators of public repositories, and publishers/editors that work to enhance their existing products or launch new journals to better ‘deal with data’.
To facilitate the discussion, a panel of representatives from the key stakeholder groups will give short presentations and perspectives on:
- the role for biocurators in the review of data associated with data-journals
- the interplay and synergies between curation in public or in-house repositories vs data-journals
- the use of community standards to enable consistency in the content and drive data reuse

To help us in shaping the panel’s presentations and the open discussion, we invite all meeting participants to submit questions via this form.

1. Introduction by chairs: Susanna-Assunta Sansone (University of Oxford) and Carsten Kettner (Beilstein-Institut)
2. Presentations and perspectives, panelists:
- Theodora Bloom (Public Library of Science),
- Ruth Wilson (Nature Publishing Group),

-Clare Garvey (BMC, Genome Biology),
- Rebecca Lawrence (Publisher, F1000Research, Faculty of 1000),
- Christoph Steinbeck (EBI MetaboLights and ChEBI, EMBL-EBI, Cambridge, UK)
- Ulrike Wittig (SABIO-RK Database, Heidelberg Institute for Theoretical Studies, Germany).
3. Open discussion

Reusing curated data to perform annotation with biomedical ontologies: Wednesday 3-5pm

Organizers and Presenters: James Malone, Simon Jupp, Tony Burdett (EBI)

Data sharing and integration have become integral to a lot of biomedical data analysis approaches. As data providers and analysts seek to make sense of diverse and ever changing experimental technologies for these purposes, the use of ontologies in the curation of biomedical data has an increasingly important role. This task is often manually intensive and requires many skilled experts to undertake. In this workshop we will present Zooma, a tool for improving the automation of biomedical data with ontologies. Zooma is a knowledge base of expert curation knowledge, generated from a decade of manual ontology annotations made by the curators at ArrayExpress and the Gene Expression Atlas at EBI. Zooma exploits this vast knowledge and exposes these via a web interface and API, allowing a user to enter text and have ontology suggestions based on previously curated data. We will demonstrate how to use Zooma for curation and invite participants to bring along their own data to try with the tool. We will also give an overview of using the API to perform curation using the web services. Finally, we will illustrate how new curation knowledge can be added to the Zooma model to improve curation performance in the future using Zooma’s additive Semantic Model.

Prerequisites to participate: There are no specific prerequisites to

attend. Participants are encouraged to bring their own data examples along

to try curating using the tool as a simple list of words or spreadsheet.

Handling Metagenomics Data : Wednesday 3-5pm

Chairs: Peter Sterk - Oxford e-Research Centre, University of Oxford, UK & Maria J. Martin - Team Leader UniProt (Development), EMBL-European Bioinformatics Institute, UK

Metagenomics is a growing discipline in biology that provides access to genomes of communities of microbes, such as bacteria, archaea, viruses, protozoa and fungi, enabling researchers a new way of studying the composition, dynamics and functionality of uncultured microbial communities. This field is likely to generate a considerable amount of data of collective genomes from microbial communities with relevant functional information for the biological databases. The goal of this session is to review the current status of Metagenomics research and for the speakers to provide their current work and their view in the future challenges in the field and future developments. Special focus will be in data sources, annotation and functional prediction.