book

Scalable Big Data Architecture: A Practitioner’s Guide to Choosing Relevant Big Data Architecture

by Bahaaldine Azarmi

January 2016

Intermediate to advanced

160 pages

3h 35m

English

Apress

Read now

Unlock full access

Identifying Big Data SymptomsSize MattersTypical Business Use CasesUnderstanding the Big Data Project’s EcosystemHadoop DistributionData AcquisitionProcessing LanguageMachine LearningNoSQL StoresCreating the Foundation of a Long-Term Big Data ArchitectureArchitecture OverviewLog Ingestion ApplicationLearning ApplicationProcessing EngineSearch EngineSummary
NoSQL LandscapeKey/ValueColumnDocumentGraphNoSQL in Our Use CaseIntroducing CouchbaseArchitectureCluster Manager and Administration ConsoleManaging DocumentsIntroducing ElasticSearchArchitectureMonitoring ElasticSearchSearch with ElasticSearchUsing NoSQL as a Cache in a SQL-based ArchitectureCaching DocumentElasticSearch Plug-in for Couchbase with Couchbase XDCRElasticSearch OnlySummary

First Approach to Data ArchitectureA Little Bit of BackgroundDealing with the Data SourcesProcessing the DataSplitting the ArchitectureBatch ProcessingStream ProcessingThe Concept of a Lambda ArchitectureSummary
Streaming ArchitectureArchitecture DiagramTechnologiesThe Anatomy of the Ingested DataClickstream DataThe Raw DataThe Log GeneratorSetting Up the Streaming ArchitectureShipping the Logs in Apache KafkaDraining the Logs from Apache KafkaSummary
Definining an Analytics StrategyContinuous ProcessingReal-Time QueryingProcess and Index Data Using SparkPreparing the Spark ProjectUnderstanding a Basic Spark ApplicationImplementing the Spark StreamerImplementing a Spark IndexerImplementing a Spark Data ProcessingData Analytics with ElasticsearchIntroduction to the aggregation frameworkVisualize Data in KibanaSummary
Introduction to Machine LearningSupervised LearningUnsupervised LearningMachine Learning with SparkAdding Machine Learning to Our ArchitectureAdding Machine Learning to Our ArchitectureEnriching the Clickstream DataLabelizing the DataTraining and Making PredictionSummary
Dockerizing the ArchitectureIntroducing DockerInstalling DockerCreating Your Docker ImagesComposing the ArchitectureArchitecture ScalabilitySizing and Scaling the ArchitectureMonitoring the Infrastructure Using the Elastic StackConsidering SecuritySummary

Content preview from Scalable Big Data Architecture: A Practitioner’s Guide to Choosing Relevant Big Data Architecture

CHAPTER 4

Streaming Data

In the previous chapter, we focused on a long-term processing job, which runs in a Hadoop cluster and leverages YARN or Hive. In this chapter, I would like to introduce you to what I call the 2014 way of processing the data: streaming data. Indeed, more and more data processing infrastructures are relying on streaming or logging architecture that ingest the data, make some transformation, and then transport the data to a data persistency layer.

This chapter will focus on three key technologies: Kafka, Spark, and the ELK stack from Elastic. We will work on combining them to implement different kind of logging architecture ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Start your free trial

Publisher Resources

ISBN: 9781484213261Purchase book

Scalable Big Data Architecture: A Practitioner’s Guide to Choosing Relevant Big Data Architecture

by Bahaaldine Azarmi

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

You might also like

Designing Big Data Platforms

Big Data for Architects

Big Data and the Internet of Things: Enterprise Information Architecture for a New Age

SQL Server 2019 Big Data Clusters Crash Course: Installing and Using a Big Data Cluster for Data Analysis

Publisher Resources

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,and much more.

You might also like

Designing Big Data Platforms

Big Data for Architects

Big Data and the Internet of Things: Enterprise Information Architecture for a New Age

SQL Server 2019 Big Data Clusters Crash Course: Installing and Using a Big Data Cluster for Data Analysis

Publisher Resources

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.