Science of Collaboratories logo - link
An alliance to advance the understanding of collaboratories
Science of Collaboratories

Return to the list of Collaboratory Projects


Name of Collaboratory :


Collaborative Annotation of a Large Biomedical Corpus (CALBC)


URL :  

Collaboratory Status :

Completed   Start Date : 2009 End Date : 2011 Info Last Updated : Thu, Dec 9 2010 7:01pm PST

Primary Collaboratory Function :

  Open Community Contribution System  

Secondary Collaboratory Functions :


Domain(s) :

MATHEMATICS/COMPUTER SCIENCES >Computer Science >Information Science and Systems
BIOLOGICAL/AGRICULTURAL SCIENCES >Biological Sciences >Biomedical Sciences

Brief Description of the Collaboratory :


CALBC (Collaborative Annotation of a Large Biomedical Corpus) is a European Support Action addressing the automatic generation of a very large, community-wide shared text corpus annotated with biomedical entities. The goal is to create a broadly scoped and diversely annotated corpus (about one million Medline immunology-related abstracts annotated with different semantic types) by automatically integrating the annotations from different named entity recognition systems.

Participation was open to any team that was willing to submit annotations obtained with their own named entity recognition or concept identification system. Participants received an assessment of their results against the SSC through a fully automated analysis.

The resulting corpus can be exploited for different goals:

The text mining community can train existing text mining solutions to reproduce the CALBC annotations.
Novel text mining solutions can be developed using the corpus, such as new methods for the disambiguation of entities.
CALBC will provide a larger body of biomedical information than is currently available to the text mining community.
The corpus will be delivered in a Resource Description Framework (RDF) representation so that it can be integrated in the Semantic Web. The corpus will serve as a data resource for data mining solutions that contribute to the understanding of immunological questions.

The project started in January 2009 and terminated in June 2011. During this period the project partners organised the first and the second CALBC challenge in autumn 2009 and autumn 2010, respectively. The main outcome of the project will be an annotated corpus of about 1 million Medline abstracts providing annotations from the first and the second challenge.

October 2009: Challenge I opens
February 14th, 2010 (extended): Challenge I closed
June 17th to 18th, 2010: First CALBC Workshop (EBI, Hinxton, Cambridge, U.K.)
September 13th 2010: Starting of CALBC Challenge II
October 19th 2010:CALBC training data available
December 15th, 2010: Challenge II closes
March 2011: Second and final CALBC Workshop
June 30th, 2011: Final harmonized corpus available (although it does not appear to be available on their Website).


Access to Instruments :


Access to Information Resources :

  Those who wanted to participate in the CALBC challenge, had to email a request for registration. Then they get access to the CALBC challenge mailing list the login details to the CALBC submission site, which gives them access to the annotation guidelines, corpora, resources and to the evaluation services.  

Access to People as Resources :

  They have had two workshops.  

Funding Agency or Sponsor :

European Union (EU)

Notes on Funding Agencies/Sponsors:

Organizations with Funded Participants:
Organization name:
Approx # of participants:
Description of organization's role(s):
Erasmus Medical Center
Project partner
European Molecular Biology Laboratory (EMBL)
   European Bioinformatics Institute (EBI)
Project partner
Friedrich-Schiller University in Jena
Project partner
Project Partner

Notes on Participants/Organizations:
The above are the project organizers. Project participants are from Europe and Asia.


Communications Technology Used :

  Email, mailing lists.  

Technical Capabilities :

  Key Articles :    

Project-reported performance data :

  Home | About SOC | Workshops | Resources | News & Events  

University of California, Irvine Logo

University of California, Irvine

School of Information Logo

School of Information University of Michigan