4

CORRELATION AND CO-OCCURRENCE

Correlation and co-occurrence analysis refer to a class of techniques that explore relationships among how words are used in a body of text. Words commonly appearing near each other are known as collocates and represent concepts which have some form of semantic connection within the body of text being analyzed. For example, foreign news articles mentioning Barack Obama traditionally preface his name with the country he governs and his title, as in United States President Barack Obama. A correlation analysis would identify the high level of cooccurrence of these phrases and draw a connection between the person Barack Obama and the position he occupies, President of the United States.

Understanding Correlation

Get Data Mining Methods for the Content Analyst now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.