Consider the amount of data that is being generated as you read this paragraph. Much of that data will be produced, stored, and processed to gain insight and value—for example, a temperature monitor within the house that gives constant updates or a feed from various social media platforms. There's an awful lot of potential in the data that comes out in real time. Being able to capture, process, store, and learn from it can be difficult; but with emerging tools it's becoming easier to put solutions together.
This chapter covers the use of Spring XD for consuming real-time data using the Twitter streaming application programming interface (API). The examples show you how to write custom processors in Spring XD to perform real-time analysis on the incoming Twitter data (tweets).
Capturing the Firehose of Data
Companies that provide continuous streams of data often refer to it as “the firehose”; the data just flows out, and it's up to recipients to capture the data they want and process it as required. Often, some form of agreement with the data provider must be signed before the data is made available for consumption.
Considerations of Using Data in Real Time
Before dashing off to your desk and coding up a real-time application that scans the entire Twitter firehose, it's worth considering if real time is actually the way to go. Just because real-time processing is available doesn't mean you should always use it.