15
Topic Modeling â Summarizing Financial News
In the last chapter, we used the bag-of-words (BOW) model to convert unstructured text data into a numerical format. This model abstracts from word order and represents documents as word vectors, where each entry represents the relevance of a token to the document. The resulting document-term matrix (DTM)âor transposed as the term-document matrixâis useful for comparing documents to each other or a query vector for similarity based on their token content and, therefore, finding the proverbial needle in a haystack. It provides informative features to classify documents, such as in our sentiment analysis examples.
However, this document model produces both high-dimensional data and very sparse ...
Get Machine Learning for Algorithmic Trading - Second Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.