Introducing Spark andKafka | 137
valemp_country = hiveContext.sql(“select distinct(ctry) from
empSpark”).collect.foreach(println)
//to display five records from the table ‘author_hive’ from ‘sqoopdb’
valemp_country = hiveContext.sql(“select * from sqoopdb.author_hive
limit 5”).collect.foreach(println)
//to display the number of total records from the table ‘author_hive’
from ‘sqoopdb’
valemp_country = hiveContext.sql(“select count(*) from sqoopdb.author_
hive”).collect.foreach(println)
6.1.5 Spark Libraries: Streaming
The Spark Streaming library is for streaming data. It is a very popular library as it takes
Spark’s big data processing power and extends it to ‘fast data’. Spark Streaming has proven its
ability for streaming gigabytes per second (Ref. Figure 6.10). Both these combined abilities
of ‘Big Data’ and ‘fast data’, has huge potential ranging from real-time fraud detection to mar-
keting, which is relevant to a customer now, instead of focusing on the customer’s intention
from last week.
M06 Big Data Simplified XXXX 01.indd 137 5/17/2019 2:49:14 PM

Get Big Data Simplified now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.