5 Author profiling as a machine-learning task

This chapter covers

  • Implementing your user profiling algorithm
  • Exploring NLP techniques with NLTK and spaCy
  • Introducing scikit-learn
  • Applying Decision Trees machine-learning classifier

In this and the next chapter, you will build your own algorithm that can identify the profile or even the precise identity of an anonymous author of a text based solely on their writing. As you will find out over the next two chapters, this task brings together several useful NLP concepts and techniques that were introduced in the previous chapters. You’ve learned that

  • Tokenizers can be applied to split text into individual words.

  • Words may be meaningful, or they may simply express some function (e.g., linking ...

Get Getting Started with Natural Language Processing now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.