Next-Generation Big Data: A Practical Guide to Apache Kudu, Impala, and Spark

Index

A

Active-active dual ingest, Kafka

MirrorMaker

Spark streaming

StreamSets

Alluxio

administering

master

worker

Apache Spark and

architecture

components

client

primary master

secondary master

worker

installation

use

big data processing performance and scalability

high availability and persistence

memory usage and minimize garbage collection

multiple frameworks and applications

reduce hardware requirements

Alteryx

Browse data tool

City field

CSV format

Customer Segment field

Input Data tool

Output Data tool

selecting files

Select tool

Sort tool

Tool Palette

Amazon Elastic MapReduce (EMR)

Amazon Web Services (AWS)

Cloudera on

Amazon EMR

architecture

on Azure and GCP

Cloudera Altus

databricks

EBS

EC2 instance

ephemeral or instance storage

regions and availability zones

security groups ...

Get Next-Generation Big Data: A Practical Guide to Apache Kudu, Impala, and Spark now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Next-Generation Big Data: A Practical Guide to Apache Kudu, Impala, and Spark by Butch Quinto

Index

A

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly