Index
A
Access control lists (ACLs)
Amazon’s Simple Storage Service (S3)
Amazon Web Services (AWS)
Analytical processing and insights
columnar aggregation
array explode
coffee_orders
complex summary statistics
hierarchical aggregation
many rows from one
OrderItems
order total
pivoting
simple summary statistics
Spark DSL
create note
data aggregation
environment
adding file system
build command
container dependencies
Multi-stage Docker
spinning up
Zeppelin container
Zeppelin-Spark directory layout
grouped datasets
Spark functions
SeeSpark functions
Analytical Window functions
vs. aggregation
create specification
ordered index
row_number/sum
transaction difference
Analytics
Apache Airflow
APIs
batch jobs
code-driven approach
components
operators
schedulers/executors
tasks
creating user
data ...

Get Modern Data Engineering with Apache Spark: A Hands-On Guide for Building Mission-Critical Streaming Applications now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.