spacer

CALBC Challenge Tasks

Participants' input (annotated corpora) are used to generate the Silver Standard Corpus II in the first round of the CALBC challenge (Challenge I, 2009/2010) and the Silver Standard Corpus III in the second round of the CALBC challenge (Challenge II, 2010/2011).


Evaluation of participants' input

Evaluation will be performed against the harmonized contributions that have been gathered from the participants' contributions to the same challenge, i.e. the contributions will be harmonized and then the contributions will be assessed against the harmonized set (see Rebholz-Schuhmann et al., JBCB 2009).

Challenge task A: Named Entity Recognition


Task

The participant's system provides annotations of the boundaries and semantic groups of the found entities.

Example (see annotation guidelines):

The <e id=":::PRGE">INS gene</e> regulates the molecular expression of ...


Evaluation and feedback

The evaluation against the Silver Standard Corpus II (SSC-II) will be done separately for each semantic group that is annotated in the participant's corpus.

Two different types of boundary alignments will be used:

  • Exact match evaluation: For each semantic group under scrutiny, the system.s annotations for the entity boundaries have to match exactly to the entity boundaries in the SSC-II. This evaluation is meaningful in the sense that the participant achieves a high agreement concerning the boundaries of the entities annotated in the SSC-II.
  • Nested match evaluation: The annotations of the participant.s system have to include the boundaries of the SSC-II, i.e., the system annotates a larger or equal span than the annotations contained in the SSC-II. This evaluation is meaningful in the sense that the system identifies the complete location of the entity corresponding with the semantic group.

For each alignment, precision, recall and F-score will be determined. These parameters will be calculated directly by the submission site. Detailed annotation results of the system will be made available to each participant for the training set of 50,000 abstracts.

Challenge task B: Concept identification


Task

The participant's system provides annotations of the boundaries and concept identifiers of the found entities.

Example (see annotation guidelines):

The <e id="Uniprot:P01308:T028|UMLS:C1337112:T028">INS gene</e> regulates the molecular expression of...


Evaluation and feedback

The annotations will be evaluated against the Silver Standard Corpus II (SSC-II). Concept identifiers will be compared for the two boundary alignments as for the first challenge and additionally at the sentence level or document level. Again, precision, recall and F-score will be determined for each alignment.

Glossary

Silver Standard Corpus I: 50,000 documents selected at random from an immunology corpus of 1 million Medline abstracts, annotated by 5 annotation systems. This corpus serves as training input [if required] for the first round of the CALBC challenge.

Silver Standard Corpus II: 100,000 documents selected at random from an immunology corpus of 1 million Medline abstracts, annotated by participants' annotation systems that took part in the first challenge. This corpus will be available for the second round of the CALBC challenge.

Silver Standard Corpus III: up to 850,000 documents from an immunology corpus of 1 million Medline abstracts, annotated by participants' annotation systems that took part in the second challenge

spacer
spacer