November 2018
Intermediate to advanced
360 pages
9h 36m
English
Let's take a look at the following steps:
import syssys.path.append('/PATH_TO/spark-2.3.2-bin-hadoop2.7/python/') # Not conda#Careful with Java version#conda install py4j
Be sure to change PATH_TO to whatever path you have for your Spark installation.
import pyspark as sparkfrom pyspark.sql.functions import col,round as round_
We will be using the round function, but we will rename it to round_ to avoid clashes with the builtin round function.
sc = spark.SparkContext('spark://127.0.1.1:7077')
sqlc = spark.SQLContext(sc)
There are other contexts for Spark, and we will discuss ...