Architecting Data-Intensive SaaS Applications
by William Waddington, Kevin McGinley, Pui Kei Johnston Chu, Gjorgji Georgievski, Dinesh Kulkarni
Chapter 4. Data Processing
Data applications provide value by processing large volumes of quickly changing raw data to provide customers with actionable insights and embedded analytical tools. There are many ways to approach data processing, from third-party tools and services to coding and deploying bespoke data pipelines. A modern data platform should support all of these options, giving you the power to choose which best meets your needs. In this chapter you will learn how to assess the trade-offs of different data processing methods, providing the necessary understanding to make informed choices about working with the tooling provided by data platforms.
We will start with an overview of design considerations for this space, highlighting the elements you should consider when architecting data processing pipelines as part of a data application. Then we’ll cover best practices and look at some real-world examples of implementing these practices with Snowflake’s Data Cloud.
Design Considerations
Data processing is a sizable task that needs to be done in a way that is very low latency, low maintenance, and does not require manual intervention. A data platform that can meet this challenge will enable product teams to focus on application development instead of managing ingestion processes, and will ensure that users get insights as quickly as possible. The considerations presented in this section will guide you as you consider how to approach data processing.
Raw Versus Conformed ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access