Question and answers with Lexalytics

As posed by the editors and answered by Jeff Catlin

This brief chapter poses practical questions on text analysis to Jeff Catlin, CEO of Lexalytics, Inc., a text analytics and sentiment software company based in Boston, MA, providing solutions primarily to the finance, enterprise search, and reputation management industries. Among its many capabilities, Lexalytics technology powers the Thomson Reuters News Analytics system. For more information on Lexalytics' capabilities, please look up “Directory of new analytics service providers” under Thomson Reuters (see p. 344).

So, Jeff, for those looking to analyze text, what are the biggest challenges that they will face when analyzing this largely unstructured content?

To be honest, people that work on text processing and search understand that getting control of the content is often a much bigger hurdle than building out an application. Not all content is created equal. If you consider content sources like Twitter and a Reuters newsfeed, it's pretty hard to imagine that those will ever be handled with identical approaches. Twitter is badly formed, with little if any capitalization, punctuation, or grammar, while something like a Reuters feed is well formed, but more verbose; so, very different approaches must be used to process these two distinctly different types of content. As you focus down on content that is specific to financial services, the problems change slightly.

There are some good ...

Get The Handbook of News Analytics in Finance now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.