9Bayesian Analysis for Text Data

Abstract

This chapter provides an introduction to Bayesian analysis of text data using the Latent Dirichlet Allocation (LDA) model that models text data in terms of topics and probabilities of words within topics. Textual responses are modeled as a mixture of pure types, or archetypes, that form a convex hull characterizing the distribution of respondent heterogeneity. A topic probability vector characterizes the words provided by each respondent, and can be used to form integrated models of textual response and choice and scaled response. A conjoint dataset is used to illustrate the model. We find that the text data helps clarify the origin of demand.

9.1 INTRODUCTION

Most data available for analysis in marketing is provided as text responses and narratives related to products and their use. These data are useful in understanding precursors to demand that are observed in the marketplace, or in the form of stated preferences arising out of choice experiments. The opinions, beliefs, and perceptions of consumers provide qualitative insight into how consumers find value in the features embodied in a product offering, identify opportunities for product improvement and provide guidance to firms in how best to communicate the benefits of their offerings.

Textual responses comes from many sources including consumer product reviews, transcripts from focus group interviews and open‐ended questions in surveys. The advantage of textual responses is that ...

Get Bayesian Statistics and Marketing, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.