Appendix D. Machine learning tools and techniques
Much of natural language processing involves machine learning. So it pays to understand some of the basic tools and techniques of machine learning. Some have been covered in earlier chapters, some haven’t, but all warrant at least a few words here.
D.1 Data selection and avoiding bias
Data selection and feature engineering are frought with the hazards of bias (in human terms). Once you’ve baked your own biases into your algorithm, by choosing a particular set of features, the model will fit to those biases and produce biased results. If you’re lucky enough to discover this bias before going to production, it can require a significant amount of effort to undo the bias. Your entire pipeline must ...
Get Natural Language Processing in Action now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.