book

Designing Big Data Platforms

by Yusuf Aytas

July 2021

Beginner to intermediate

336 pages

9h 22m

English

Wiley

Read now

Unlock full access

1.1 Defining Modern Big Data Platform1.2 Fundamentals of a Modern Big Data Platform
2.1 A Bit of History2.2 What Makes Big Data2.3 Components of Big Data Architecture2.4 Making Use of Big Data

3.1 Problem Definition3.2 Processing Large Data with Linux Commands3.3 Processing Large Data with PostgreSQL3.4 Cost of Big Data
4.1 Big Data Storage Patterns4.2 On‐Premise Storage Solutions4.3 Cloud Storage Solutions4.4 Hybrid Storage Solutions
5.1 Defining Offline Data Processing5.2 MapReduce Technologies5.3 Apache Spark5.4 Apache Flink5.5 Presto
6.1 The Need for Stream Processing6.2 Defining Stream Data Processing6.3 Streams via Message Brokers6.4 Streams via Stream Engines
7.1 Log Collection7.2 Transferring Big Data Sets7.3 Aggregating Big Data Sets7.4 Data Pipeline Scheduler7.5 Patterns and Practices7.6 Exploring Data Visually
8.1 Data Science Applications8.2 Data Science Life Cycle8.3 Data Science Toolbox8.4 Productionalizing Data Science
9.1 Need for Data Discovery9.2 Data Governance9.3 Data Discovery Tools
10.1 Infrastructure Security10.2 Data Privacy10.3 Law Enforcement10.4 Data Security Tools
11.1 Platforms11.2 Big Data Systems and Tools11.3 Challenges
12.1 Event Sourcing12.2 Kappa Architecture12.3 Data Mesh12.4 Data Reservoirs12.5 Data Catalog12.6 Self‐service Platform12.7 Abstraction12.8 Data Guild12.9 Trade‐offs12.10 Data Ethics
A.1 Lambda ArchitectureA.2 Apache CassandraA.3 Apache Beam
B.1 Activity Tracking RecipeB.2 Data Quality AssuranceB.3 Estimating Time to DeliveryB.4 Incident Response RecipeB.5 Leveraging Spark SQL MetricsB.6 Airbnb Price Prediction

Content preview from Designing Big Data Platforms

3A Minimal Data Processing and Management System

After reading this chapter, you should be able to:

Use Unix tooling for data processing

Build a pipeline to extract information

Automate common data tasks

Use PostgreSQL for large data

Overcomplicating a problem more than it has to be is a general issue in computing. It is hard to per se “keep it simple stupid.” Experience and time are essential to formulate solutions that are good enough for the problem. In this chapter, solutions for relatively large data are developed using standard tools and open‐source software.

3.1 Problem Definition

Defining a problem and working on this through different strategies are the focus in this chapter. Let us assume we are hosting an online book store.

3.1.1 Online Book Store

Imagine a scenario where we have an online book store. The online bookstore has a shopping process where consumers directly buy books from the seller in real time. The book store has an interface for searching the books and a landing page with some of the recommended books. As the user navigates through the website, he/she can add books to the shopping basket. At any time, the user can decide to buy books and check out. If the user is not registered, the website first attempts to register the user. After that, the user can choose a shipping address. Finally, the website offers payment options.

3.1.2 User Flow Optimization

A special attention is given to the user flow optimization. We aim to have a better understanding ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Start your free trial

Publisher Resources

ISBN: 9781119690924Purchase Link

Designing Big Data Platforms

by Yusuf Aytas

3A Minimal Data Processing and Management System

3.1 Problem Definition

3.1.1 Online Book Store

3.1.2 User Flow Optimization

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

You might also like

Designing Cloud Data Platforms

Designing Cloud Data Platforms

Big Data for Architects

Essential PySpark for Scalable Data Analytics

Publisher Resources

3.1 Problem Definition

3.1.1 Online Book Store

3.1.2 User Flow Optimization

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,and much more.

You might also like

Designing Cloud Data Platforms

Designing Cloud Data Platforms

Big Data for Architects

Essential PySpark for Scalable Data Analytics

Publisher Resources

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.