spacer

CALBC Challenge: Participation

CALBC Challenge II opened on September 13th 2010. The challenge mailing list is open for registration (see below). If you are interested in the project, please find more information on this website and please don't hesitate to contact us

Who can participate?

Participation is open to any team that is willing to submit annotations obtained with their own named entity recognition or concept identification system. Participants will receive an assessment of their results against the SSC through a fully automated analysis.

How can I participate?

If you want to participate to the CALBC challenge, then you only have to send a request for registration to public@calbc.eu. You will be signed up to the CALBC challenge mailing list (challenge@calbc.eu) and in addition, you will receive the login details to the CALBC submission site, which gives you access to the annotation guidelines, corpora, resources and to the evaluation services.

Where do I get access to resources?

The resources for the CALBC challenge are distributed through the CALBC submission site. You need your login details (See: How can I participate?).

How do I annotate the corpus?

After downloading the corpus, participants have to annotated the corpus according to the annotation guidelines.

There are two tasks (for further information see challenge tasks or the guidelines):

Task A ("Named Entity Recognition"): To annotate the corpus with entity boundaries for one or more semantic groups, e.g. Genes/Proteins, diseases, species, others.

Task B ("Concept identification"): To annotate the corpus with boundaries and concept identifiers for the entities.

Participants are encouraged to make use of a set of preferred semantic groups, semantic types and concept ids referring to concept systems such as the UMLS, UniProt, etc. If no preferred resources are used, participants must provide a description of their type system and vocabulary and a categorisation of their entities in the proposed semantic group system.

Participants are free to limit their systems to annotations that only cover a subset of the semantic groups.

How can I submit my annotated corpus?

Participants can submit their annotated corpus through the CALBC submission site to the automatic evaluation system. Participants have to register first to the submission site (see How can I participate?).

What delivers the automatic evaluation system?

Upon submission, the automatic evaluation system will produce basic statistics on the amount of annotations and the distribution of types. After the challenge is closed, the submission will be evaluated against the Silver Standard Corpus that will be produced as part of the challenge from the submissions (SSC-II for the first round, SSC-III for the second round).

What will be evaluated in the annotated corpus from the participant?

The scale of the corpus means that manual curation will not be possible (see BioCreative I and II). All harmonisation steps have to be performed automatically. The evaluation of the annotations against the silver standard will be by purely automatic means on a farm of compute servers to enable acceptable response cycles.

After submission, a fully automated analysis system will instantly start the analysis and alignment process. The results of the alignment of the submitted corpus against the Silver Standard will be reported as soon as the alignment is finished (estimated to take approximately one day). The analysis process will deliver statistical parameters that help to interpret the contained annotations and the performance of the annotations in comparison to the test corpus. The annotation results of the participating systems will be made available to each participant for a subset of 50,000 abstracts of the corpus.

For CALBC challenge Task A ("Named Entity Recognition") the evaluation system will compare the boundaries in the participant's corpus against the boundaries in the Silver Standard Corpus and will calculate precision, recall and F-measure. Only boundaries of the same semantic group in both corpora are compared against each other.

For the challenge Task B ("Concept Identification") the evaluation system will compare the delivered concept ids in combination with the boundaries against the Silver Standard Corpus and will calculate precision, recall and F-measure.

Participants are encouraged to make use of a set of preferred semantic groups, semantic types and concept ids referring to concept systems such as the UMLS, UniProt, etc. If no preferred resources are used, participants must provide a description of their type system and vocabulary and a categorisation of their entities in the proposed semantic group system.

Participants are free to limit their systems to annotations that only cover a subset of the semantic groups.

For the preferred concept systems, if the semantic group is not specified explicitly, the evaluation system is able to derive semantic groups from semantic types or from concept ids in this order.

spacer
spacer