Chapter 6: String Clustering

Frequently when wrangling data, you will find columns that look as though they have similar values, but they do not. To handle this task, Optimus gives you some handy techniques through which you can easily detect which strings are similar and group them, giving you some options that could point to the best value in the group. We will explore all these techniques in this chapter.

In this chapter, we will learn about the following topics:

  • Exploring string clustering
  • Key collision methods
  • Phonetic encoding
  • Nearest-neighbor methods
  • Applying suggestions

Technical requirements

Optimus can work with multiple backend technologies to process data, including graphics processing units (GPUs). For GPUs, Optimus uses the ...

Get Data Processing with Optimus now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.