August 2019
Intermediate to advanced
560 pages
13h 41m
English
There are two source files. The first one is located at eshadoop/src/com/example/spark/run.py and the second one is located at eshadoop/src/com/example/spark_ml/kmeans.py. First, let's look at the main workflow in the run.py file, as described in the code block:
from pyspark.sql import SparkSessionimport pyspark.sql.functions as ffrom pyspark.sql.types import *from pyspark.sql.functions import expr, litfrom pyspark.ml.feature import VectorAssemblerfrom com.example.spark_ml.kmeans import create_anomaly_detection_model, find_anomaliesimport pandas......if __name__ == '__main__': spark_session = create_spark_session() df_data, es_data = extract_es_data(spark_session) df_labels, centers = create_anomaly_detection_model(es_data) write_es_data(df_data, ...
Read now
Unlock full access