Chapter 19. Spark Streaming Sources
As you learned earlier in Chapter 2, a streaming source is a data provider that continuously delivers data. In Spark Streaming, sources are adaptors running within the context of the Spark Streaming job that implement the interaction with the external streaming source and provide the data to Spark Streaming using the DStream abstraction. From the programming perspective, consuming a streaming data source means creating a DStream using the appropriate implementation for the corresponding source.
Example 19-1. Creating a text stream from a socket connection
// creates a DStream using a client socket connected to the given host and port
In Example 19-1, we can see that the creation of a streaming source is provided by a dedicated implementation.
In this case, it is provided by the
ssc instance, the streaming context, and results in a
DStream[String] that contains the text data delivered by the socket typed with the content of the DStream.
Although the implementation for each source is different, this pattern is the same for all of them: creating a source requires a
streamingContext and results in a DStream that represents the contents of the stream. The streaming application further operates on the resulting DStream to implement the ...