book

Building Real-Time Data Pipelines

by Gary Orenstein, Conor Doherty, Kevin White, Steven Camina

November 2015

Beginner to intermediate

61 pages

1h 7m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Introduction
1. When to Use In-Memory Database Management Systems (IMDBMS)
Improving Traditional Workloads with In-Memory DatabasesOnline Transaction Processing (OLTP)Online Analytical Processing (OLAP)HTAP: Bringing OLTP and OLAP TogetherModern WorkloadsThe Need for HTAP-Capable SystemsIn-Memory Enables HTAPCommon Application Use CasesReal-Time AnalyticsRisk ManagementPersonalizationPortfolio TrackingMonitoring and DetectionConclusion
2. First Principles of Modern In-Memory Databases
The Need for a New ApproachArchitectural Principles of Modern In-Memory DatabasesIn-MemoryDistributed SystemsRelational with MultimodelMixed MediaConclusion
3. Moving from Data Silos to Real-Time Data Pipelines
The Enterprise Architecture GapReal-Time Pipelines and Converged ProcessingStream Processing, with ContextConclusion
4. Processing Transactions and Analytics in a Single Database
Requirements for Converged ProcessingIn-Memory StorageAccess to Real-Time and Historical DataCompiled Query Execution PlansGranular Concurrency Control Fault Tolerance and ACID ComplianceBenefits of Converged ProcessingEnabling New Sources of RevenueReducing Administrative and Development OverheadSimplifying InfrastructureConclusion
5. Spark
BackgroundCharacteristics of Spark Understanding Databases and Spark Other Use CasesConclusion
6. Architecting Multipurpose Infrastructure
Multimodal SystemsMultimodel SystemsTiered StorageThe Real-Time Trinity: Apache Kafka, Spark, and an Operational Database Conclusion
7. Getting to Operational Systems
Have Fewer Systems Doing More Modern Technologies Enable Real-Time Programmatic Decision MakingModern Technologies Enable Ad-Hoc Reporting on Live DataConclusion
8. Data Persistence and Availability
Data DurabilityData AvailabilityData BackupsConclusion
9. Choosing the Best Deployment Option
Considerations for Bare MetalVirtual Machine (VM) and Container ConsiderationsOrchestration FrameworksConsiderations for Cloud or On-Premises DeploymentsBenefits of Cloud: Expansion and FlexibilityBenefits of On-Premises: Control, Security, Performance Optimization, and PredictabilityChoosing the Right Storage MediumRAMSSD and DiskDeployment Conclusions

10. Conclusion
Recommended Next Steps

Overview

Traditional data processing infrastructures—especially those that support applications—weren’t designed for our mobile, streaming, and online world. This O’Reilly report examines how today’s distributed, in-memory database management systems (IMDBMS) enable you to make quick decisions based on real-time data.

In this report, executives from MemSQL Inc. provide options for using in-memory architectures to build real-time data pipelines. If you want to instantly track user behavior on websites or mobile apps, generate reports on a changing dataset, or detect anomalous activity in your system as it occurs, you’ll learn valuable lessons from some of the largest and most successful tech companies focused on in-memory databases.

Explore the architectural principles of modern in-memory databases
Understand what’s involved in moving from data silos to real-time data pipelines
Run transactions and analytics in a single database, without ETL
Minimize complexity by architecting a multipurpose data infrastructure
Learn guiding principles for developing an optimally architected operational system
Provide persistence and high availability mechanisms for real-time data
Choose an in-memory architecture flexible enough to scale across a variety of deployment options

Conor Doherty, Data Engineer at MemSQL, is responsible for creating content around database innovation, analytics, and distributed systems.

Gary Orenstein, Chief Marketing Officer at MemSQL, leads marketing strategy, product management, communications, and customer engagement.

Kevin White is the Director of of Operations and a content contributor at MemSQL.

Steven Camiña is a Principal Product Manager at MemSQL. His experience spans B2B enterprise solutions, including databases and middleware platforms.

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Unlock Complex and Streaming Data with Declarative Data Pipelines

Publisher Resources

ISBN: 9781491975879

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills