3.4 A FRAMEWORK FOR REAL-TIME NEWS ANALYTICS

The core of our real-time news analysis engine relies on a scoring method that assesses the relative volume/significance of news from a specific category of news. For instance, we wish to identify periods when the volume of news about foreign exchange markets is abnormally high, or when there is a flurry of macroeconomic news announcements.

For a given topic, say foreign exchange news, the scoring procedure has the following parameters:

  • A list of keywords/key phrases and real-valued weights: ( W1, γ1),…, (Wk, γk).
  • A rolling window size, l (typically about 5–10 minutes).
  • A calibration rolling window size, L (typically about 90 days).

The keywords list and the last l minutes of news are used to create a raw score, and this score is normalized/calibrated using statistics about the news over the last L days (as described below).

3.4.1 Assigning scores to news

The score at a given point in time, t, is assigned as follows: Let (w1,…, wk) be the vector of keyword frequencies in the time interval [t l, t) (i.e., wi is the number of times word/phrase Wi has appeared in the last l minutes). The raw score at time t is then defined to be:

image

In this form, the raw score will tend to be high when news volume is high, and so we calibrate/normalize the score using the calibration rolling window: We maintain a record of the scores that have been assigned ...

Get The Handbook of News Analytics in Finance now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.