Training our own POS-taggers

The prediction done by spaCy's models with regard to its POS-tag are statistical predictions; unlike, say, whether or not it is a stop word, which is just a check against a list of words. If it is a statistical prediction, this means that we can train a model for it to perform better predictions or predictions that are more relevant to the dataset we are intending to use it on. Here, better isn't meant to be taken too literally the current spaCy model already comes to 97% in terms of tagging accuracy.

Before we dive in deep into our training process, let's clarify a few commonly used terms when it comes to machine learning, and machine learning for text.

Training - the process of teaching your machine learning ...

Get Natural Language Processing and Computational Linguistics now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.