Skip to Content
Hands-On Machine Learning for Algorithmic Trading
book

Hands-On Machine Learning for Algorithmic Trading

by Stefan Jansen
December 2018
Beginner to intermediate
684 pages
21h 9m
English
Packt Publishing
Content preview from Hands-On Machine Learning for Algorithmic Trading

Parsing and tokenizing text data

A token is an instance of a characters that appears in a given document and should be considered a semantic unit for further processing. The vocabulary is a set of tokens contained in a corpus deemed relevant for further processing. A key trade-off in the following decisions is the accurate reflection of the text source at the expense of a larger vocabulary that may translate into more features and higher model complexity.

Basic choices in this regard concern the treatment of punctuation and capitalization, the use of spelling correction, and whether to exclude very frequent so-called stop words (such as and or the) as meaningless noise.

An additional decision is about the inclusion of groups of n individual ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Machine Learning for Algorithmic Trading - Second Edition

Machine Learning for Algorithmic Trading - Second Edition

Stefan Jansen

Publisher Resources

ISBN: 9781789346411Supplemental Content