Version: cosegment2012-06-30

The implementation of the Collocation Segmentation used in :

  • Daudaravičius V. Applying Collocation Segmentation to the ACL Anthology Reference Corpus. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics. Jeju, Korea, July, 2012,
  • Daudaravičius V. Automatic Multilingual Annotation of EU Legislation with Eurovoc Descriptors. In Proceedings of the Workshop on Exploring and Exploiting Official Publications, LREC, Istanbul, 2012.

Implemented in Haskell programming language.


  • Plain text chunking with Collocation segmentation
  • Change collocation segmentation threshold
  • Simillar text search in corpus
  • Build feature vectors of profiles and find the most related to the new text
  • Search for collocations simillar in context

CoSegment (2012 June 30):

Old versions

Version: 2010.04

The implementation of the collocation segmentation used in (Daudaravičius V. The Influence of Collocation Segmentation and Top 10 Items to Keyword Assignment Performance. In proceedings of Computational Linguistics and Intelligent text processing CICling-2010,, 2010 Iasi, Romania. Lecture Notes in Computer Science. Springer-Verlag. 648–660.)

Collocation segmentation: (for Windows) cosegment.tgz (for Linux).
Written in C++. OS independent.

CoSegment Readme file