Index

A

Active-active dual ingest, Kafka
MirrorMaker
Spark streaming
StreamSets
Alluxio
administering
master
worker
Apache Spark and
architecture
components
client
primary master
secondary master
worker
installation
use
big data processing performance and scalability
high availability and persistence
memory usage and minimize garbage collection
multiple frameworks and applications
reduce hardware requirements
Alteryx
Browse data tool
City field
CSV format
Customer Segment field
Input Data tool
Output Data tool
selecting files
Select tool
Sort tool
Tool Palette
Amazon Elastic MapReduce (EMR)
Amazon Web Services (AWS)
Cloudera on
Amazon EMR
architecture
on Azure and GCP
Cloudera Altus
databricks
EBS
EC2 instance
ephemeral or instance storage
regions and availability zones
S3
security groups ...

Get Next-Generation Big Data: A Practical Guide to Apache Kudu, Impala, and Spark now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.