15 Topic Modeling â Summarizing Financial News

In the last chapter, we used the bag-of-words (BOW) model to convert unstructured text data into a numerical format. This model abstracts from word order and represents documents as word vectors, where each entry represents the relevance of a token to the document. The resulting document-term matrix (DTM)âor transposed as the term-document matrixâis useful for comparing documents to each other or a query vector for similarity based on their token content and, therefore, finding the proverbial needle in a haystack. It provides informative features to classify documents, such as in our sentiment analysis examples.

However, this document model produces both high-dimensional data and very sparse ...

Get Machine Learning for Algorithmic Trading - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Machine Learning for Algorithmic Trading - Second Edition by Stefan Jansen

15

Topic Modeling â Summarizing Financial News

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly

15

Topic Modeling â Summarizing Financial News

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly

Topic Modeling â Summarizing Financial News