Skip to Content
R Programming By Example
book

R Programming By Example

by Omar Trejo Navarro
December 2017
Beginner to intermediate
470 pages
12h 29m
English
Packt Publishing
Content preview from R Programming By Example

Improving our results with TF-IDF

In general in text analysis, a high raw count for a term inside a text does not necessarily mean that the term is more important for the text. One of the most important ways to normalize the term frequencies is to weigh a term by how often it appears not only in a text, but also in the entire corpus.

The more a word appears inside a given text and doesn't appear too much across the whole corpus, it means that it's probably important for that specific text. However, if the term appears a lot inside a text, but also appears a lot in other texts in the corpus, it's probably not important for the specific text, but for the entire corpus, and this dilutes it's predictive power.

In IR, TF-IDF is one of the most ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Efficient R Programming

Efficient R Programming

Colin Gillespie, Robin Lovelace
R Programming

R Programming

Jared P. Lander

Publisher Resources

ISBN: 9781788292542Supplemental Content