July 2016
Intermediate to advanced
344 pages
10h 11m
English
M. Thomas*; H. Bretz*; T. Vacek*; B. Hachey†; S. Singh*; F. Schilder* * Thomson Reuters, NYC, NY, USA† University of Sydney, Sydney, Australia
We describe an entity detection and resolution system called Newton that is being used to identify company names in Reuters news articles and ground the mention text to a company authority database. The system is required to be fast and precise on arbitrary web news sources. We introduce an infrastructure for authority-driven lookup-tagging followed by joint mention and disambiguation classification using a support vector machine. Performance on a corpus of 70k automatically annotated documents from the ...