Index

A

Aggregations
cubes
flight summary dataset
functions
approx_count_distinct (col)
avg(col)
count(col)
countDistinct(col)
description
min(col), max(col)
Scala language
skewness(col), kurtosis(col)
sum(col)
sumDistinct(col)
variance(col), stddev(col)
grouping
categorical values
collection group values
multiple aggregations
origin_airport and Count Aggregation
origin_state and origin_city, Count Aggregation
RelationalGroupedDataset
levels
operations
pivoting
rollups
state
time windows
AlphaGo
Alternate-least-square (ALS) algorithm
Analytic functions
Arbitrary stateful processing
action
flatMapGroupsWithState
handling state timeouts
mapGroupsWithState
structured streaming
Artificial intelligence (AI)

B

Batch data processing
Binarizer transformer
BinaryClassificationEvaluator
Broadcast ...

Get Beginning Apache Spark 2: With Resilient Distributed Datasets, Spark SQL, Structured Streaming and Spark Machine Learning library now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.