O'Reilly logo
live online training icon Live Online training

Text analysis for business analytics with Python


enter image description here

Extracting insight from text data

Walter Paczkowski, Ph.D.

Social media and online reviews in the internet era have given businesses a new form of data: text. Unlike the well-structured and organized numbers-oriented data of the pre-internet era, text data is highly unstructured and chaotic, as it includes verbatim survey responses, call center logs, notes from field representatives, customer emails, logs of online chats, warranty claims, dealer technician lines, and report orders. And yet it is data: a structure can be imposed, and it can be analyzed to extract useful information and insights for decision making in areas such as new product development, customer services, and message development. The problem is that few business analysts know how to work with text data—or are overwhelmed by the many toolsets available for text analysis.

Join Expert Walter Paczkowski to learn how to work with text data to extract meaningful insights such as sentiments (positive and negative) about your products and company, opinions, product suggestions and complaints, customer misunderstandings, and competitive actions and positions. Over three hours, you’ll dive into sophisticated text processing tools and methods and discover the possibilities of text-processing software, such as Python packages.

What you'll learn-and how you can apply it

By the end of this live, hands-on, online course, you’ll understand:

  • The unstructured nature of text data, including the concepts of a document and a corpus
  • The singular value decomposition (SVD) of a document-term matrix (DTM)
  • Python packages used for text analysis and when to use them
  • How to prepare text data for analysis, including data cleaning, stop words, and grammar inconsistencies
  • How to summarize text data using text frequency/inverse document frequency (TF/IDF) weights
  • How to extract meaning from a DTM: keywords, phrases, and topics

And you’ll be able to:

  • Impose structure on text data
  • Extract keywords, phrases, and topics with text analysis tools
  • Analyze a business text dataset for key insights using Python packages
  • Apply these techniques to business problems

This training course is for you because...

  • You’re an advanced business analyst who deals with text data.
  • Your background is largely analytical, and you want to expand both your knowledge and toolset of analytical methods.


  • Familiarity with Python and the Jupyter Notebook

Recommended preparation:

Recommended follow-up:

About your instructor

  • Walter R. Paczkowski has a Ph.D. in Economics from Texas A&M University (1977). With over 40 years of extensive quantitative experience as an analyst in AT&T's Analytical Support Center, a Member of the Technical Staff at AT&T Bell Labs, head of Pricing Research at AT&T's Computer Systems division, and founder of Data Analytics Corp., he brings a wealth of knowledge to share about data analysis. His work as a market research consultant is focused on helping companies in a wide range of industries, such as telecommunications, pharmaceuticals, jewelry, food & beverages, and automotive to mention a few, to turn their market data into actionable market information. Walter is also currently on the faculty of the Department of Economics, Rutgers University (Adjunct) and was formerly with the Department of Mathematics & Statistics, The College of New Jersey (Adjunct). Walter is also the author of two analytical books: Market Data Analysis Using JMP (SAS Press, 2016) and Pricing Analytics (Routledge 2018) with a third forthcoming on quantitative methods for new product development (Routledge, 2019). You can learn more about Walter and his consulting company, Data Analytics Corp., at www.dataanalyticscorp.com.


The timeframes are only estimates and may vary according to how the class is progressing

Introduction (25 minutes)

  • Lecture, demonstrations, and exercises: text data in a Big Data environment; the structure of text data.
  • Group Discussion
  • Q&A

Text Data Fundamentals (30 minutes)

  • Lecture, demonstrations, and exercises: processing and cleaning text data using Python’s str package, Pandas’ string methods, and regular expressions.
  • Group Discussion
  • Q&A
  • Break (5 minutes)

Text Data Preprocessing (30 minutes)

  • Lecture, demonstrations, and exercises: creating a corpus; cleaning the corpus; doing simple analytics with clean text data.
  • Group Discussion
  • Q&A
  • Break (5 minutes)

Text Modeling (75 minutes)

  • Lecture, demonstrations, and exercises: removing punctuation and changing cases; tokenizing the corpus; creating a bag-of-words; deleting stop-words; creating a DTM; weighting the DTM.
  • Group Discussion
  • Q&A
  • Break (5 minutes)

Text Analysis (60 minutes)

  • Lecture, demonstrations, and exercises: word clouds; phrase extraction and analysis; hierarchical cluster analysis; topic modeling by Latent Semantic Analysis.
  • Group Discussion
  • Q&A

Wrap-up (5 minutes)