Internal and External Tagsets in Part-of-Speech Tagging
Author: Thorsten Brants
Editor:
We present an approach to statistical part-of-speech tagging that uses two different tagsets, one for its internal and one for its external
representation. The internal tagset is used in the underlying Markov model, while the external tagset constitutes the output of the tagger. The internal
tagset can be modified and optimized to increase tagging accuracy (with respect to the external tagset). We evaluate this approach in an experiment
and show that it performs significantly better than approaches using only one tagset.
|