Here we consider the implication of the overreaction and underreaction to news, whether there is a “hierarchy” to information, and consider which news items are deemed most important. The majority of our quantitative research focuses on companies' reported balance sheets or P&L data and sell-side analysts' estimates. It is only recently that we have been able to go beyond this to understand the motivations behind corporates and fundamental analysts' decisions by looking at higher frequency news flow datasets. Over the past few years several data vendors have started to collect and translate headlines and text from sources worldwide, ranging from electronic newswires, newspapers, and magazines. News items are categorized, tagged, and uploaded so that news can be downloaded at the latest by the close of business on the day of the news release. Many news vendors provide low-latency data feeds and analyse the sentiment of stories within milliseconds of the news release.

We begin by considering the issues surrounding cleaning news data to ensure the collection of both timely and relevant information, distinguishing between news types, identifying mixed and stand-alone events, and deciphering informational content. We highlight five key issues specific to analysing news flow.

8.2.1 Timeliness of news

The first challenge is to define an information event. How do we define what is “new” news from what has already been reported? We look beyond just earnings ...

Get The Handbook of News Analytics in Finance now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.