![]() |
CALBC Challenge: An Overview
We propose to create a broadly-scoped and diversely annotated corpus (150,000 Medline abstracts on immunology annotated with about a dozen of semantic groups) by automatically integrating the annotations from different named entity recognition and concept identification systems. The result of the integration process will be a silver standard corpus (SSC).
Participation is open to any team that is willing to submit annotations obtained with their own named entity recognition or concept identification system. Participants will receive an assessment of their results against the SSC through a fully automated analysis. CorpusThe corpus consists of 150,000 Medline abstracts on immunology (fully sentencised).SubmissionThe submission Web site for CALBC challenge participants is here.Participants can submit their annotated corpus to the automatic evaluation system. Participants have to register first (public@calbc.eu) to be able to use the evaluation system.Three types of annotations will be considered: semantic groups, semantic types and concept identifiers. It is necessary to deliver at least one assignment, i.e. semantic group or semantic type or concept id. Typically, annotations indicate the boundaries of the entity corresponding with the semantic group and concept, but annotations at the sentence level (without specifying the entity boundaries) are allowable. Participants are encouraged to make use of a set of preferred semantic groups, semantic types and concept ids referring to concept systems such as the UMLS, UniProt, etc. If no preferred resources are used, participants must provide a description of their type system and vocabulary and a categorisation of their entities in the proposed semantic group system. Participants are free to limit their systems to annotations that only cover a subset of the semantic groups. For the preferred concept systems, if the semantic group is not specified explicitly, the evaluation system is able to derive semantic groups from semantic types or from concept ids in this order. Challenge 1: Named Entity RecognitionTaskThe participant’s system provides annotations of the boundaries and semantic groups of the found entities. Evaluation and feedbackThe evaluation against the SSC will be done separately for each semantic group that is annotated in the participant’s corpus. Two different types of boundary alignments will be used:
Challenge 2: Concept identificationTaskThe participant’s system provides annotations of the boundaries and concept identifiers of the found entities. Evaluation and feedbackThe annotations will be evaluated against the SSC. Concept identifiers will be compared for the two boundary alignments as for the first challenge and additionally at the sentence level or document level. Again, precision, recall and F-score will be determined for each alignment, and detailed results will be provided for the training corpus. ![]() |