Chapter 7. How to Explain a Text Classifier
In the previous chapters, we have learned a lot about advanced analytical methods for unstructured text data. Starting with statistics and using NLP, we have found interesting insights from text.
Using supervised methods for classification, we have assigned text documents to already-given categories by training algorithms. Although we have checked the quality of the classification process, we have skipped an important aspect: we have no idea why the model has decided to assign a category to a text.
This might sound unimportant if the category was correct. However, in daily life you often have to explain your own decisions and make them transparent to others. The same is true for machine learning algorithms.
In real-life projects, you will more often than not hear the question âWhy has the algorithm assigned this category/sentiment?â Even before that, understanding how the algorithm has learned something will help you to improve the classification by using different algorithms, adding features, changing weights, and so on. Compared to structured data, the question is much more important with text as humans can interpret the text itself. Moreover, text has many artifacts such as signatures in emails that you better avoid and make sure that they are not the dominant features in your classification.
In addition to the technological perspective, there are also some legal aspects to keep in mind. You might be responsible for proving that ...
Get Blueprints for Text Analytics Using Python now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.