ACOPOST - A Collection Of POS Taggers

News

2002/09/23
Renamed ICOPOST to ACOPOST and moved the package to the Sourceforge repository of open source projects. Released version 1.8.4 which contains a preliminary user's guide. The project urgently needs maintainers, admins, developers, active users etc. since I (Ingo Schröder) won't have the time to maintain the package in the future. Please mail me at ixs at users.sourceforge.net.
2002/04/24
Release 1.8.3BETA, contains an additional tagger based on example-based techniques, not documented on this page!
2001/08/21
Release 0.9.0 (first public release).
2001/08/20
Cleaned things up.
2001/07/14
First public talk about ICOPOST.
2001/06/08
Web page started.

What's ACOPOST about?

Part-of-speech (POS) tagging is the task of assigning grammatical classes to words in a natural language sentence. It's important because subsequent processing stages (such as parsing) become easier if the word class for a word is available.

Here's an English example of a tagged sentence taken from the Wall Street Journal of the Penn Treebank:
Measures           NNS 
of                 IN
manufacturing      VBG 
activity           NN 
fell               VBD
more               RBR
than               IN 
the                DT 
overall            JJ 
measures           NNS 
.                  .

ACOPOST is a set of freely available POS taggers that I modelled after well-known techniques. The programs are written in C and run under various UNIX flavors (and probably even under Windows). ACOPOST currently consists of four taggers which are based on different frameworks:
  1. Maximum Entropy Tagger MET: This tagger uses an iterative procedure to successively improve parameters for a set of features that help to distinguish between relevant contexts. It's based on a framework suggested by Ratnaparkhi [1997].
  2. Trigram Tagger T3: This kind of tagger is based on Hidden Markov Models where the states are tag pairs that emit words, i. e., it's based on transitional and lexical probabilities. The technique has been suggested by Rabiner [1990] and the implementation is influenced by Brants [2000].
  3. Error-driven Transformation-based Tagger TBT: Transformation rules are learned from an annotated corpus which change the currently assigned tag depending on triggering context conditions. The general approach as well as the application to POS tagging has been proposed by Brill [1993].
  4. Example-based tagger ET: Example-based models (also called memory-based, instance-based or distance-based) rest on the assumption that cognitive behavior can be achieved by looking at past experiences taht resemble the current problem rather than learning and applying acstract rules. They have been suggested for NLP by Daelemans et al. [1996].

A detailed description, an extensive evaluation and new suggestions can be found in an accompanying technical report [Schröder 2002].

Further information

The project page at Sourceforge can be reached at http://sourceforge.net/projects/acopost/ where the latest releases can be found.

Mailing lists are available for announcements, for developers and for users at http://sourceforge.net/mail/?group_id=62355.

References

Thosrten Brants. 2000. TnT - as statistical part-of-speech tagger. In Proceedings of the Sixth Applied Natural Language Processing Conference (ANLP-2000), Seattle, WA, USA.

Eric Brill. 1993. Automatic grammar induction and parsing free text: A transformation-based appraoch. In Proceedings of the 31st Annual Meeting of the ACL.

Walter Daelemans, Jakub Zavrel, Peter Berck & Steven Gillis. 1996. MBT: A memory-based part of speech tagger-generator. In Eva Ejerhed & Ido Dagan, ed., Proceedings of the Fourth Workshop on Very Large Corpora, pages 14-27.

Ingo Schröder. 2002. A Case Study in Part-of-Speech tagging Using the ICOPOST Toolkit. Technical report FBI-HH-M-314/02. Department of Computer Science, University of Hamburg. Available from http://nats-www.informatik.uni-hamburg.de/~ingo/papers/.

Lawrence R. Rabiner. 1990. A tutorial on hidden markov models and selected applications in speech recognition. In Alex Waibel & Kai-Fu Lee, ed., Readings in Speech Recognition. Morgan Kaufmann, San Mateo, CA, USA, pages 267-290. See also Errata at http://www.media.mit.edu/~rahimi/rabiner/rabiner-arrata/.

Adwait Ratnaparkhi. 1998. Maximum Entropy Models for Natural Language Ambiguity Resolution. Ph.D. thesis, University of Pennsylvania.
SourceForge Logo