Challenge Rollout

The CALBC project partners have finished a pilot phase to the challenge, where each partner has contributed annotations to the corpus. Now we want to encourage external participants to contribute annotations that could be integrated into the corpus before the CALBC challenge opens. Participants have to sign up to the CALBC mailing list for the challenge: Participants who want to sign up to the mailing list should send a request to

Annotation guidelines are available here.

The full corpus is available here. In the current phase the corpus does not contain any annotations. A small set of annotated documents is available here (1485 documents). This corpus can be used to reproduce annotations.

Time line

  • 1-Sep-2009:
    all 150,000 Documents are made available to the public in one chunk
  • Mid of October 2009:
    participants are encouraged to submit the full set of 150,000 documents to the submission site after they have annotated them. The annotations from the participants will be integrated into the 150,000 documents of the harmonized corpus (if possible)
  • November 2009:
    50,000 annotated documents will be made available for training to the public. The remaining 100,000 un-annotated documents will made available for testing at a later stage
  • January 2010 (tentatively):
    submission of the final sets of annotated documents (100,000 documents)