O'Reilly logo
live online training icon Live Online training

Text analysis for business analytics with Python

enter image description here

Extracting insight from text data

Topic: Data
Walter Paczkowski, Ph.D.

Social media and online reviews in the internet era have given businesses a new form of data: text. Unlike the well-structured and organized numbers-oriented data of the pre-internet era, text data is highly unstructured and chaotic, as it includes verbatim survey responses, call center logs, notes from field representatives, customer emails, logs of online chats, warranty claims, dealer technician lines, and report orders. And yet it is data: a structure can be imposed, and it can be analyzed to extract useful information and insights for decision making in areas such as new product development, customer services, and message development. The problem is that few business analysts know how to work with text data—or are overwhelmed by the many toolsets available for text analysis.

Join Expert Walter Paczkowski to learn how to work with text data to extract meaningful insights such as sentiments (positive and negative) about your products and company, opinions, product suggestions and complaints, customer misunderstandings, and competitive actions and positions. Over three hours, you’ll dive into sophisticated text processing tools and methods and discover the possibilities of text-processing software, such as Python packages.

What you'll learn-and how you can apply it

By the end of this live, hands-on, online course, you’ll understand:

  • The unstructured nature of text data, including the concepts of a document and a corpus
  • The singular value decomposition (SVD) of a document-term matrix (DTM)
  • Python packages used for text analysis and when to use them
  • How to prepare text data for analysis, including data cleaning, stop words, and grammar inconsistencies
  • How to summarize text data using text frequency/inverse document frequency (TF/IDF) weights
  • How to extract meaning from a DTM: keywords, phrases, and topics

And you’ll be able to:

  • Impose structure on text data
  • Extract keywords, phrases, and topics with text analysis tools
  • Analyze a business text dataset for key insights using Python packages
  • Apply these techniques to business problems

This training course is for you because...

  • You’re an advanced business analyst who deals with text data.
  • Your background is largely analytical, and you want to expand both your knowledge and toolset of analytical methods.


  • Familiarity with Python and the Jupyter Notebook

Recommended preparation:

Recommended follow-up:

About your instructor

  • Walter R. Paczkowski is a market research consultant at Data Analytics Corp., helping companies in a wide range of industries, such as telecommunications, pharmaceuticals, jewelry, food and beverages, and automotive, to mention a few, turn their market data into actionable information. Walter is also an adjunct faculty member of the Department of Economics at Rutgers University. He brings a wealth of knowledge to share about data analysis, drawing on his over 40 years of extensive quantitative experience as an analyst in AT&T's Analytical Support Center, a member of the technical staff at AT&T Bell Labs, head of pricing research in AT&T's Computer Systems Division, and founder of Data Analytics Corp. He was also an adjunct faculty member of the Department of Mathematics and Statistics at the College of New Jersey. Walter is the author of two analytical books—Market Data Analysis Using JMP (SAS Press, 2016) and Pricing Analytics (Routledge, 2018)—with a third forthcoming on quantitative methods for new product development (Routledge, 2020). He holds a PhD in economics from Texas A&M University.


The timeframes are only estimates and may vary according to how the class is progressing

Introduction (25 minutes)

  • Lecture, demonstrations, and exercises: text data in a Big Data environment; the structure of text data.
  • Group Discussion
  • Q&A

Text Data Fundamentals (30 minutes)

  • Lecture, demonstrations, and exercises: processing and cleaning text data using Python’s str package, Pandas’ string methods, and regular expressions.
  • Group Discussion
  • Q&A
  • Break (5 minutes)

Text Data Preprocessing (30 minutes)

  • Lecture, demonstrations, and exercises: creating a corpus; cleaning the corpus; doing simple analytics with clean text data.
  • Group Discussion
  • Q&A
  • Break (5 minutes)

Text Modeling (75 minutes)

  • Lecture, demonstrations, and exercises: removing punctuation and changing cases; tokenizing the corpus; creating a bag-of-words; deleting stop-words; creating a DTM; weighting the DTM.
  • Group Discussion
  • Q&A
  • Break (5 minutes)

Text Analysis (60 minutes)

  • Lecture, demonstrations, and exercises: word clouds; phrase extraction and analysis; hierarchical cluster analysis; topic modeling by Latent Semantic Analysis.
  • Group Discussion
  • Q&A

Wrap-up (5 minutes)