spacer

CALBC Challenge: An Overview

We propose to create a broadly-scoped and diversely annotated corpus (150,000 Medline abstracts on immunology annotated with about a dozen of semantic groups) by automatically integrating the annotations from different named entity recognition and concept identification systems. The result of the integration process will be a silver standard corpus (SSC). Participation is open to any team that is willing to submit annotations obtained with their own named entity recognition or concept identification system. Participants will receive an assessment of their results against the SSC through a fully automated analysis.

For full details please refer to the CALBC Web site: www.calbc.eu. The Web site will provide latest and up-to-date information on the status of the challenge and the requirements for the challenge.

Participants have to sign up to the CALBC mailing list for the challenge: challenge@calbc.eu. Participants who want to sign up to the mailing list should send a request to public@calbc.eu.

When the challenge is opened, submissions can be made and assessments will be provided on a regular basis. Submitted annotations will be used to generate a revised version of the SSC after the submission period is closed.


Corpus

The corpus consists of 150,000 Medline abstracts on immunology (fully sentencised).

Submission

The submission Web site for CALBC challenge participants is here.Participants can submit their annotated corpus to the automatic evaluation system. Participants have to register first (public@calbc.eu) to be able to use the evaluation system.

Three types of annotations will be considered: semantic groups, semantic types and concept identifiers. It is necessary to deliver at least one assignment, i.e. semantic group or semantic type or concept id. Typically, annotations indicate the boundaries of the entity corresponding with the semantic group and concept, but annotations at the sentence level (without specifying the entity boundaries) are allowable.

Participants are encouraged to make use of a set of preferred semantic groups, semantic types and concept ids referring to concept systems such as the UMLS, UniProt, etc. If no preferred resources are used, participants must provide a description of their type system and vocabulary and a categorisation of their entities in the proposed semantic group system.

Participants are free to limit their systems to annotations that only cover a subset of the semantic groups.

For the preferred concept systems, if the semantic group is not specified explicitly, the evaluation system is able to derive semantic groups from semantic types or from concept ids in this order.

Challenge 1: Named Entity Recognition

Task


The participant’s system provides annotations of the boundaries and semantic groups of the found entities.

Evaluation and feedback


The evaluation against the SSC will be done separately for each semantic group that is annotated in the participant’s corpus.

Two different types of boundary alignments will be used:
  • Exact match evaluation: For each semantic group under scrutiny, the system’s annotations for the entity boundaries have to match exactly to the entity boundaries in the SSC. This evaluation is meaningful in the sense that the participant achieves a high agreement concerning the boundaries of the entities annotated in the SSC.
  • Nested match evaluation: The annotations of the participant’s system have to include the boundaries of the SSC, i.e., the system annotates a larger or equal span than the annotations contained in the SSC. This evaluation is meaningful in the sense that the system identifies the complete location of the entity corresponding with the semantic group.
For each alignment, precision, recall and F-score will be determined. These parameters will be calculated directly by the submission site. Detailed annotation results of the system will be made available to each participant for the training set of 50,000 abstracts.

Challenge 2: Concept identification

Task


The participant’s system provides annotations of the boundaries and concept identifiers of the found entities.

Evaluation and feedback


The annotations will be evaluated against the SSC. Concept identifiers will be compared for the two boundary alignments as for the first challenge and additionally at the sentence level or document level. Again, precision, recall and F-score will be determined for each alignment, and detailed results will be provided for the training corpus. spacer
spacer