Skip to Content
Hands-On Machine Learning for Algorithmic Trading
book

Hands-On Machine Learning for Algorithmic Trading

by Stefan Jansen
December 2018
Beginner to intermediate
684 pages
21h 9m
English
Packt Publishing
Content preview from Hands-On Machine Learning for Algorithmic Trading

Automatic phrase detection

Preprocessing typically involves phrase detection, that is, the identification of tokens that are commonly used together and should receive a single vector representation (for example, New York City, see the discussion of n-grams in Chapter 13, Working with Text Data).

The original Word2vec authors use a simple lift scoring method that identifies two words wi, wj as a bigram if their joint occurrence exceeds a given threshold relative to each word's individual appearance, corrected by a discount factor δ:

The scorer can be applied repeatedly to identify successively longer phrases.

An alternative is the normalized ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Machine Learning for Algorithmic Trading - Second Edition

Machine Learning for Algorithmic Trading - Second Edition

Stefan Jansen

Publisher Resources

ISBN: 9781789346411Supplemental Content