Connecting to social networks

Let's delve into the first steps of the data-intensive app architecture's integration layer. We are going to focus on harvesting the data, ensuring its integrity and preparing for batch and streaming data processing by Spark at the next stage. This phase is described in the five process steps: connect, correct, collect, compose, and consume. These are iterative steps of data exploration that will get us acquainted with the data and help us refine the data structure for further processing.

The following diagram depicts the iterative process of data acquisition and refinement for consumption:

Connecting to social networks

We connect to the social networks ...

Get Spark for Python Developers now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.