Skip to Content
SQL for Data Analysis
book

SQL for Data Analysis

by Cathy Tanimura
September 2021
Beginner to intermediate
357 pages
9h 53m
English
O'Reilly Media, Inc.
Book available
Content preview from SQL for Data Analysis

Chapter 5. Text Analysis

In the last two chapters, we explored applications of dates and numbers with time series analysis and cohort analysis. But data sets are often more than just numeric values and associated timestamps. From qualitative attributes to free text, character fields are often loaded with potentially interesting information. Although databases excel at numeric calculations such as counting, summing, and averaging things, they are also quite good at performing operations on text data.

I’ll begin this chapter by providing an overview of the types of text analysis tasks that SQL is good for, and of those for which another programming language is a better choice. Next, I’ll introduce our data set of UFO sightings. Then we’ll get into coding, covering text characteristics and profiling, parsing data with SQL, making various transformations, constructing new text from parts, and finally finding elements within larger blocks of text, including with regular expressions.

Why Text Analysis with SQL?

Among the huge volumes of data generated every day, a large portion consists of text: words, sentences, paragraphs, and even longer documents. Text data used for analysis can come from a variety of sources, including descriptors populated by humans or computer applications, log files, support tickets, customer surveys, social media posts, or news feeds. Text in databases ranges from structured (where data is in different table fields with distinct meanings) to semistructured ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

SQL for Data Analytics

SQL for Data Analytics

Upom Malik, Matt Goldwasser, Benjamin Johnston
Analytics Engineering with SQL and dbt

Analytics Engineering with SQL and dbt

Rui Pedro Machado, Helder Russa

Publisher Resources

ISBN: 9781492088776Errata PageSupplemental Content