March 2019
Beginner to intermediate
182 pages
4h 6m
English
Spark SQL is one of the four components on top of the Spark platform, as we saw earlier in the chapter. It can be used to execute SQL queries or read data from any existing Hive insulation, where Hive is a database implementation also from Apache. Spark SQL looks very similar to MySQL or Postgres. The following code snippet is a good example:
#Register the DataFrame as a SQL temporary viewdf.CreateOrReplaceTempView("people")sqlDF = spark.sql("SELECT * FROM people")sqlDF.show()#+----+-------+#| age| name|#+----+-------+#+null|Jackson|#| 30| Martin|#| 19| Melvin|#+----|-------|
You'll need to select all the columns from a certain table, such as people, and using the Spark objects, you'll feed in a very standard-looking SQL statement, ...
Read now
Unlock full access