An example based on text documents

We will now turn to an example that comes from a study performed at Carnegie Mellon University by Prof. Noah Smith's research group. The study was based on mining the so-called 10-K reports that companies file with the Securities and Exchange Commission (SEC) in the United States. This filing is mandated by law for all publicly traded companies. The goal of their study was to predict, based on this piece of public information, what the future volatility of the company's stock would be. In the training data, we are actually using historical data for which we already know the outcome.

There are 16,087 examples available. The features, which have already been preprocessed for us, correspond to different words, ...

Get Building Machine Learning Systems with Python - Third Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.