O'Reilly logo

Natural Language Processing with Java and LingPipe Cookbook by Krishna Dayanidhi, Breck Baldwin

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

HMM-based NER

HmmChunker uses an HMM to perform chunking over tokenized character sequences. Instances contain an HMM decoder for the model and tokenizer factory. The chunker requires the states of the HMM to conform to a token-by-token encoding of a chunking. It uses the tokenizer factory to break the chunks down into sequences of tokens and tags. Refer to the Hidden Markov Models (HMM) – part of speech recipe in Chapter 4, Tagging Words and Tokens.

We'll look at training HmmChunker and using it for the CoNLL2002 Spanish task. You can and should use your own data, but this recipe assumes that training data will be in the CoNLL2002 format.

Training is done using an ObjectHandler which supplies the training instances.

Getting ready

As we want to train ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required