Galaxy workshop: Monday 3-5pm
Chairs: Suzanna Lewis and Dave Clements
Galaxy is a data integration and analysis platform for the life sciences. GO Galaxy (http://galaxy.berkeleybop.org/) is free, Galaxy-based, public web server that provides a simple integrated environment in which ontological analysis tools can be linked into workflows.The GO Tools page lists more than 50 tools for doing GO-based analyses but these are not well integrated. AmiGO/GOOSE offers some functionality such as slimming, enrichment, data extraction but these are difficult to chain together. The GO Galaxy server addresses these needs.This workshop will introduce Galaxy and then demonstrate how to use Galaxy to do
- Basic genomic analysis, and
- Enrichment analysis using the GO Galaxy Server.
The motivating GO Galaxy application is enrichment analysis: Given a set of genes what biological process(es) is this set enriched for? A proposed standard for representing the results of term enrichment analyses is proposed so that the results from alternative tools can be directly compared.
Participants will gain specific knowledge about how to use Galaxy to perform these types of analysis, and repeat, reuse, share, and publish their analysis. This workshop is geared towards biologists. No programming or command line knowledge is assumed or required. A basic knowledge of ontology principles will be helpful.
On Sequence Similarity and using Web Services for biocuration (2 presentations of 1 hour each): Monday 3-5pm
Chairs: Dr Willaim Pearson, (UVA) Rodrigo Lopez and Hamish McWilliam (EBI)
Sequence similarity searching is the most powerful tool to infer evolutionary relationships between genes; assign function to novel genes and ultimately annotate entire genomes and their proteomes. Dr. William Pearson will be talking about sequence similarity searching from the perspective of gene products and describing search results beyond the humble sequence alignments. His talk is entitled "Looking at Proteins, Using More Than Sequences".
Access to databases and analytical tools using Web Services perfectly suits the biological data curation process. These allow for the seamless integration of large databases and complex analytical tools into the curation workbench. Rodrigo Lopez and Hamish McWilliam will be talking about the technology and tools available to the biocuration process. They will present examples of existing curation pipelines that use these services and invite the participant to participate and actively influence their development.
1. Looking at Proteins, Using More Than Sequences: Dr. William Pearson, UVA.
2. EBI Web Services for biocuration: Rodrigo Lopez/Hamish McWilliam, EBI.
Variation annotation workshop LOVD : Monday 3-5pm
Chairs: Dr Raymond Dalgleish (Univ Leicester) and Peter Taschner (Leiden Univ Medical Centre)
The Leiden Open Variation Database system (LOVD: http://lovd.nl/) is the leading solution for the online gene-centric collection and display of DNA variation. This variation is often associated with inherited diseases and LOVD databases have been created for every disease gene which is described in Online Mendelian Inheritance in Man (OMIM: http://www.ncbi.nlm.nih.gov/omim). Many of these databases are actively curated, but others are in need of enthusiastic curators to adopt them and develop their data content. We will present an introduction to LOVD and explain the tasks involved discovering and curating data for entry into the database, as well as covering aspects of variant description including reporting standards and reference sequences.
Emerging standards for genome annotation and curation in the era of high throughput sequencing: Tuesday 3-5pm
Chair: Kim Pruitt (NCBI)
Next-generation sequencing has enabled researchers to perform genomic and transcriptomic sequencing at rates that were unimaginable in the past. Microbial genomes can be now sequenced in a matter of hours, which leads to a significant increase in the number of assembled genomes being deposited in the public archives. This huge increase in DNA sequence data presents new challenges for the submission, annotation and analysis pipelines. New standards for the submission, validation, analysis, and curation of genome data must be developed for both reference genomes and population studies derived from clinical outbreaks. This workshop will provide an overview of the interplay between computational process, tools, and curation activities and how this combined approach is poised to deal with the data onslaught while continuing to maintain data integrity within NCBI resources. Information will provided about efforts underway to streamline data submission, improve annotation pipelines, automate validation and provide analysis tools for data visualization across the multitude of genomes. Talks and discussions will clarify the important role of curation within these process flows and how that helps to improve data quality.
The participants of the workshop will benefit from understanding NCBI process flows and curation protocols, as well as the standards and policies for data quality assurance developed by NCBI in conjunction with the genome community.
Topics and speakers
1 Data Submission processing: using computational and curation approaches to streamline submissions and validation and improve public data – speaker: Ilene Mizrachi
2 Using NGS in Eukaryotic genome projects: how curation activities help improve outcomes of the eukaryotic genome annotation pipeline, sequence variation, and browser resources – speaker: Kim Pruitt
3 Managing High through-put Prokaryotic genome projects: streamlining genome annotation and curation activities to manage bacterial reference genomes and sequence variation – speaker: Tatiana Tatusova
BioCreative Text Mining Workshop for Biocuration: Tuesday 3-5pm
Presenters: Cecilia Arighi1, Kevin Cohen2, Martin Krallinger3, and Zhiyong Lu4
1 Center for Bioinformatics and Computational Biology, University of Delaware, DE, USA
2Computational Bioscience Program, University of Colorado School of Medicine, Aurora, CO, USA.
3Structural Biology and Biocomputing Group, Spanish National Cancer Research Centre, Madrid, Spain
4 National Center for Biotechnology Information, NIH, Bethesda, MD, USA
BioCreative: Critical Assessment of Information Extraction in Biology is an international community-wide effort that evaluates text mining and information extraction systems applied to the biological domain (http://www.biocreative.org/). A unique characteristic of this effort is its collaborative and interdisciplinary nature, as it brings together experts from various fields, including text mining, biocuration, publishing houses and bioinformatics. This allows to discuss during the accompanying BioCreative Workshops how to drive the development of text mining systems that can be integrated into the biocuration workflow and the knowledge discovery process. To address the current barriers in using text mining in biology, BioCreative has further been conducting user requirements analysis, user-based evaluations and fostering standards development for text mining tool re-use and integration.
This workshop will present several text-mining research topics addressed by the BioCreative efforts that are of particular relevance for literature curation. These topics include the extraction of bio-entity annotations using standard bio-ontologies (i.e. Gene Ontology annotation), the identification of bio-entities relevant for curation (i.e. chemical compounds and drugs), and aspects dealing with text mining systems’ utility/usability and interoperability.
The aim of this workshop is to encourage active involvement of biocurators in guiding text mining system development and adoption by demonstrating and discussing past and current efforts of the BioCreative challenges. Participation in this workshop will give biocurators the possibility to learn more about current text mining efforts useful in literature curation and will enable them to provide direct feedback to the text mining experts.
The intended audience includes both biocurators that do literature curation and developers involved in biocuration workflows. For more details and workshop agenda please go to http://www.biocreative.org/events/bcbiocuration2013/biocreative-text-mining-worksh/
Biocuration and scholarly communication cycle: roles and opportunities for biocurators: Tuesday 3-5pm
Chairs : Susanna Sansone (Oxford) and Carsten Kettner (Beilstein Institut)
The last few years have been marked by the arrival of data articles and data journals set to enhance and support sharing, reproducibility and reuse of research data underlying peer-reviewed publications.
The aim of this workshop is to explore potential synergies between biocurators of public repositories, and publishers/editors that work to enhance their existing products or launch new journals to better ‘deal with data’.
To facilitate the discussion, a panel of representatives from the key stakeholder groups will give short presentations and perspectives on:
- the role for biocurators in the review of data associated with data-journals
- the interplay and synergies between curation in public or in-house repositories vs data-journals
- the use of community standards to enable consistency in the content and drive data reuse
To help us in shaping the panel’s presentations and the open discussion, we invite all meeting participants to submit questions via this form.
1. Introduction by chairs: Susanna-Assunta Sansone (University of Oxford) and Carsten Kettner (Beilstein-Institut)
2. Presentations and perspectives, panelists:
- Theodora Bloom (Public Library of Science),
- Ruth Wilson (Nature Publishing Group),
-Clare Garvey (BMC, Genome Biology),
- Rebecca Lawrence (Publisher, F1000Research, Faculty of 1000),
- Christoph Steinbeck (EBI MetaboLights and ChEBI, EMBL-EBI, Cambridge, UK)
- Ulrike Wittig (SABIO-RK Database, Heidelberg Institute for Theoretical Studies, Germany).
3. Open discussion
Reusing curated data to perform annotation with biomedical ontologies: Wednesday 3-5pm
Organizers and Presenters: James Malone, Simon Jupp, Tony Burdett (EBI)
Data sharing and integration have become integral to a lot of biomedical data analysis approaches. As data providers and analysts seek to make sense of diverse and ever changing experimental technologies for these purposes, the use of ontologies in the curation of biomedical data has an increasingly important role. This task is often manually intensive and requires many skilled experts to undertake. In this workshop we will present Zooma, a tool for improving the automation of biomedical data with ontologies. Zooma is a knowledge base of expert curation knowledge, generated from a decade of manual ontology annotations made by the curators at ArrayExpress and the Gene Expression Atlas at EBI. Zooma exploits this vast knowledge and exposes these via a web interface and API, allowing a user to enter text and have ontology suggestions based on previously curated data. We will demonstrate how to use Zooma for curation and invite participants to bring along their own data to try with the tool. We will also give an overview of using the API to perform curation using the web services. Finally, we will illustrate how new curation knowledge can be added to the Zooma model to improve curation performance in the future using Zooma’s additive Semantic Model.
Prerequisites to participate: There are no specific prerequisites to
attend. Participants are encouraged to bring their own data examples along
to try curating using the tool as a simple list of words or spreadsheet.
Handling Metagenomics Data : Wednesday 3-5pm
Chairs: Peter Sterk - Oxford e-Research Centre, University of Oxford, UK & Maria J. Martin - Team Leader UniProt (Development), EMBL-European Bioinformatics Institute, UK
Metagenomics is a growing discipline in biology that provides access to genomes of communities of microbes, such as bacteria, archaea, viruses, protozoa and fungi, enabling researchers a new way of studying the composition, dynamics and functionality of uncultured microbial communities. This field is likely to generate a considerable amount of data of collective genomes from microbial communities with relevant functional information for the biological databases. The goal of this session is to review the current status of Metagenomics research and for the speakers to provide their current work and their view in the future challenges in the field and future developments. Special focus will be in data sources, annotation and functional prediction.
Computational metagenomics, metaproteomics, metatranscriptomics, applied metagenomics (applications in medicine, agriculture, alternative energy, ecology, etc.), metagenome standards, metagenomics and databases.
'Ocean Sampling Day' - Dawn Field,
'Minimum Information standards for metagenomic studies' - Pelin Yilmaz
'The EBI Metagenomics service' - Sarah Hunter
'TARA Oceans' - Pascal Hingamp
The speakers and participants will offer their views in the new developments and future challenges in the growing field of metagenomic studies
Connecting scientific articles with research data: Wednesday 3-5pm
Organizers: Simone Groothuis (STM Journals, Elsevier) and Elena Zudilova-Seinstra (Journal & Content Technology, Elsevier)
Includes panel discussion – panel members: Mary Shimoyama (Rat Genome Database) and Judith Blake (Mouse Genome Informatics)
Scientific research is becoming increasingly computational and data intensive. At the moment, scientific articles and related data are mostly available separately, with articles typically accessed through publishing platforms like Elsevier’s ScienceDirect, and public data stored at databases run by research institutions.
For the work of a researcher, both data and scientific articles are of high relevance, both separately and in relation to each other. As we feel researchers can profit from closer connections between the two, we try to work with the research and biocuration community to make this happen within Elsevier’s platform ScienceDirect.
Elsevier is among the first publishing companies that begun to enrich its online articles by establishing links between articles and major scientific databases. Participants of this workshop will learn about existing opportunities for partnerships between biocurated databases and publishing companies and participate in the discussion on improving existing practices.
Participants will also learn of a number of Elsevier initiatives around data publication, including the Article of the Futureproject focusing on providing ScienceDirect users with access to additional content at the right moment and at the right place.