The core of our real-time news analysis engine relies on a scoring method that assesses the relative volume/significance of news from a specific category of news. For instance, we wish to identify periods when the volume of news about foreign exchange markets is abnormally high, or when there is a flurry of macroeconomic news announcements.

For a given topic, say foreign exchange news, the scoring procedure has the following parameters:

  • A list of keywords/key phrases and real-valued weights: ( W1, γ1),…, (Wk, γk).
  • A rolling window size, l (typically about 5–10 minutes).
  • A calibration rolling window size, L (typically about 90 days).

The keywords list and the last l minutes of news are used to create a raw score, and this score is normalized/calibrated using statistics about the news over the last L days (as described below).

3.4.1 Assigning scores to news

The score at a given point in time, t, is assigned as follows: Let (w1,…, wk) be the vector of keyword frequencies in the time interval [t l, t) (i.e., wi is the number of times word/phrase Wi has appeared in the last l minutes). The raw score at time t is then defined to be:


In this form, the raw score will tend to be high when news volume is high, and so we calibrate/normalize the score using the calibration rolling window: We maintain a record of the scores that have been assigned ...

Get The Handbook of News Analytics in Finance now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.