Active-active dual ingest, Kafka
MirrorMaker
Spark streaming
StreamSets
Alluxio
administering
master
worker
Apache Spark and
architecture
components
client
primary master
secondary master
worker
installation
use
big data processing performance and scalability
high availability and persistence
memory usage and minimize garbage collection
multiple frameworks and applications
reduce hardware requirements
Amazon Elastic MapReduce (EMR)
Amazon Web Services (AWS)
Cloudera on
Amazon EMR
architecture
on Azure and GCP
Cloudera Altus
databricks
EBS
EC2 instance
ephemeral or instance storage
regions and availability zones
S3
security groups ...