![]() |
CALBC Challenge TasksParticipants' input (annotated corpora) are used to generate the Silver Standard Corpus II in the first round of the CALBC challenge (Challenge I, 2009/2010) and the Silver Standard Corpus III in the second round of the CALBC challenge (Challenge II, 2010/2011). Evaluation of participants' inputEvaluation will be performed against the harmonized contributions that have been gathered from the participants' contributions to the same challenge, i.e. the contributions will be harmonized and then the contributions will be assessed against the harmonized set (see Rebholz-Schuhmann et al., JBCB 2009). Challenge task A: Named Entity RecognitionTaskThe participant's system provides annotations of the boundaries and semantic groups of the found entities. Example (see annotation guidelines): The <e id=":::PRGE">INS gene</e> regulates the molecular expression of ... Evaluation and feedbackThe evaluation against the Silver Standard Corpus II (SSC-II) will be done separately for each semantic group that is annotated in the participant's corpus. Two different types of boundary alignments will be used:
For each alignment, precision, recall and F-score will be determined. These parameters will be calculated directly by the submission site. Detailed annotation results of the system will be made available to each participant for the training set of 50,000 abstracts. Challenge task B: Concept identificationTaskThe participant's system provides annotations of the boundaries and concept identifiers of the found entities. Example (see annotation guidelines): The <e id="Uniprot:P01308:T028|UMLS:C1337112:T028">INS gene</e> regulates the molecular expression of... Evaluation and feedbackThe annotations will be evaluated against the Silver Standard Corpus II (SSC-II). Concept identifiers will be compared for the two boundary alignments as for the first challenge and additionally at the sentence level or document level. Again, precision, recall and F-score will be determined for each alignment.
GlossarySilver Standard Corpus I: 50,000 documents selected at random from an immunology corpus of 1 million Medline abstracts, annotated by 5 annotation systems. This corpus serves as training input [if required] for the first round of the CALBC challenge. Silver Standard Corpus II: 100,000 documents selected at random from an immunology corpus of 1 million Medline abstracts, annotated by participants' annotation systems that took part in the first challenge. This corpus will be available for the second round of the CALBC challenge. Silver Standard Corpus III: up to 850,000 documents from an immunology corpus of 1 million Medline abstracts, annotated by participants' annotation systems that took part in the second challenge ![]() |