book

Scalable Big Data Architecture: A Practitioner’s Guide to Choosing Relevant Big Data Architecture

Name: Scalable Big Data Architecture: A Practitioner’s Guide to Choosing Relevant Big Data Architecture
Author: Bahaaldine Azarmi
ISBN: 9781484213261

by Bahaaldine Azarmi

January 2016

Intermediate to advanced

160 pages

3h 35m

English

Apress

Read now

Unlock full access

Cover
Title
Copyright
Dedication
Contents at a glance
Contents
About the Author
About the Technical Reviewers
Chapter 1: The Big (Data) Problem
Identifying Big Data SymptomsSize MattersTypical Business Use CasesUnderstanding the Big Data Project’s EcosystemHadoop DistributionData AcquisitionProcessing LanguageMachine LearningNoSQL StoresCreating the Foundation of a Long-Term Big Data ArchitectureArchitecture OverviewLog Ingestion ApplicationLearning ApplicationProcessing EngineSearch EngineSummary
Chapter 2: Early Big Data with NoSQL
NoSQL LandscapeKey/ValueColumnDocumentGraphNoSQL in Our Use CaseIntroducing CouchbaseArchitectureCluster Manager and Administration ConsoleManaging DocumentsIntroducing ElasticSearchArchitectureMonitoring ElasticSearchSearch with ElasticSearchUsing NoSQL as a Cache in a SQL-based ArchitectureCaching DocumentElasticSearch Plug-in for Couchbase with Couchbase XDCRElasticSearch OnlySummary

Chapter 3: Defining the Processing Topology
First Approach to Data ArchitectureA Little Bit of BackgroundDealing with the Data SourcesProcessing the DataSplitting the ArchitectureBatch ProcessingStream ProcessingThe Concept of a Lambda ArchitectureSummary
Chapter 4: Streaming Data
Streaming ArchitectureArchitecture DiagramTechnologiesThe Anatomy of the Ingested DataClickstream DataThe Raw DataThe Log GeneratorSetting Up the Streaming ArchitectureShipping the Logs in Apache KafkaDraining the Logs from Apache KafkaSummary
Chapter 5: Querying and Analyzing Patterns
Definining an Analytics StrategyContinuous ProcessingReal-Time QueryingProcess and Index Data Using SparkPreparing the Spark ProjectUnderstanding a Basic Spark ApplicationImplementing the Spark StreamerImplementing a Spark IndexerImplementing a Spark Data ProcessingData Analytics with ElasticsearchIntroduction to the aggregation frameworkVisualize Data in KibanaSummary
Chapter 6: Learning From Your Data?
Introduction to Machine LearningSupervised LearningUnsupervised LearningMachine Learning with SparkAdding Machine Learning to Our ArchitectureAdding Machine Learning to Our ArchitectureEnriching the Clickstream DataLabelizing the DataTraining and Making PredictionSummary
Chapter 7: Governance Considerations
Dockerizing the ArchitectureIntroducing DockerInstalling DockerCreating Your Docker ImagesComposing the ArchitectureArchitecture ScalabilitySizing and Scaling the ArchitectureMonitoring the Infrastructure Using the Elastic StackConsidering SecuritySummary
Index

Overview

This book highlights the different types of data architecture and illustrates the many possibilities hidden behind the term "Big Data", from the usage of No-SQL databases to the deployment of stream analytics architecture, machine learning, and governance.

Scalable Big Data Architecture covers real-world, concrete industry use cases that leverage complex distributed applications , which involve web applications, RESTful API, and high throughput of large amount of data stored in highly scalable No-SQL data stores such as Couchbase and Elasticsearch. This book demonstrates how data processing can be done at scale from the usage of NoSQL datastores to the combination of Big Data distribution.

When the data processing is too complex and involves different processing topology like long running jobs, stream processing, multiple data sources correlation, and machine learning, it’s often necessary to delegate the load to Hadoop or Spark and use the No-SQL to serve processed data in real time.

This book shows you how to choose a relevant combination of big data technologies available within the Hadoop ecosystem. It focuses on processing long jobs, architecture, stream data patterns, log analysis, and real time analytics. Every pattern is illustrated with practical examples, which use the different open sourceprojects such as Logstash, Spark, Kafka, and so on.

Traditional data infrastructures are built for digesting and rendering data synthesis and analytics from large amount of data. This book helps you to understand why you should consider using machine learning algorithms early on in the project, before being overwhelmed by constraints imposed by dealing with the high throughput of Big data.

Scalable Big Data Architecture is for developers, data architects, and data scientists looking for a better understanding of how to choose the most relevant pattern for a Big Data project and which tools to integrate into that pattern.

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781484213261Purchase book

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills