January 2019
Beginner to intermediate
154 pages
4h 31m
English
Spark SQL allows users to query a wide variety of data sources. These sources could be files, such as Java Database Connectivity (JDBC).
There are a couple of ways to load data. Let's take a look at both methods:
//Scalaval sales_df = spark.read.option("sep", "\t").option("header", "true").csv("file:///opt/data/sales/sample_10000.txt")sales_df.write.parquet("sales.parquet")val parquet_sales_DF = spark.read.parquet("sales.parquet")parquet_sales_DF.createOrReplaceTempView("parquetSales") val ipDF = spark.sql("SELECT ip FROM parquetSales WHERE id BETWEEN 10 AND 19") ipDF.map(attributes => "IPS: " + attributes(0)).show()//Javaimport org.apache.spark.sql.Dataset;import org.apache.spark.sql.Row;Dataset<Row> ...Read now
Unlock full access