Summary
In this chapter, we've built a data pipeline that analyzes large quantities of streaming data containing unstructured text and applies NLP algorithms coming from external cloud services to extract sentiment and other important entities found in the text. We also built a PixieApp dashboard that displays live metrics with insights extracted from the tweets. We've also discussed various techniques for analyzing data at scale, including Apache Spark Structured Streaming, Apache Kafka, and IBM Streaming Analytics. As always, the goal of these sample applications is to show the art of the possible in building data pipelines with a special focus on leveraging existing frameworks, libraries, and cloud services.
In the next chapter, we'll discuss ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access