book

Architecting Data-Intensive SaaS Applications

by William Waddington, Kevin McGinley, Pui Kei Johnston Chu, Gjorgji Georgievski, Dinesh Kulkarni

May 2021

Beginner to intermediate

67 pages

1h 37m

English

O'Reilly Media, Inc.

Read now

Unlock full access

1. Data Applications and Why They Matter
Data Applications DefinedCustomer 360IoTMachine Learning and Data ScienceApplication Health and SecurityEmbedded AnalyticsSummary
2. What to Look For in a Modern Data Platform
Benefits of Cloud EnvironmentsCloud-First Versus Cloud-HostedChoice of Cloud Service ProvidersSupport for Relational Databases Benefits of Relational DatabasesSeparation of Storage and ComputeData SharingWorkload IsolationAdditional ConsiderationsReliabilityExtensibilitySummary
3. Building Scalable Data Applications
Design Considerations for Data ApplicationsDesign Patterns for Storage Design Patterns for ComputeDesign Patterns for SecuritySummary
4. Data Processing
Design ConsiderationsRaw Versus Conformed DataData Lakes and Data WarehousesSchema EvolutionOther Trade-offsBest Practices for Data ProcessingETL Versus ELTSchematizationLoading DataServerless Versus serverfulBatch Versus StreamingSummary
5. Data Sharing
Data Sharing ApproachesSharing by CopySharing by ReferenceDesign ConsiderationsSharing Data with UsersGetting Feedback from UsersData Sharing in SnowflakeSnowflake Data MarketplaceSnowflake Secure Data Sharing in Action: BrazeSummary
6. Summary and Further Reading

Content preview from Architecting Data-Intensive SaaS Applications

Chapter 4. Data Processing

Data applications provide value by processing large volumes of quickly changing raw data to provide customers with actionable insights and embedded analytical tools. There are many ways to approach data processing, from third-party tools and services to coding and deploying bespoke data pipelines. A modern data platform should support all of these options, giving you the power to choose which best meets your needs. In this chapter you will learn how to assess the trade-offs of different data processing methods, providing the necessary understanding to make informed choices about working with the tooling provided by data platforms.

We will start with an overview of design considerations for this space, highlighting the elements you should consider when architecting data processing pipelines as part of a data application. Then we’ll cover best practices and look at some real-world examples of implementing these practices with Snowflake’s Data Cloud.

Design Considerations

Data processing is a sizable task that needs to be done in a way that is very low latency, low maintenance, and does not require manual intervention. A data platform that can meet this challenge will enable product teams to focus on application development instead of managing ingestion processes, and will ensure that users get insights as quickly as possible. The considerations presented in this section will guide you as you consider how to approach data processing.

Raw Versus Conformed ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Master Data Management for SaaS Applications

Publisher Resources

ISBN: 9781098102760

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design