Skip to Main Content
Foundations for Architecting Data Solutions
book

Foundations for Architecting Data Solutions

by Ted Malaska, Jonathan Seidman
September 2018
Beginner to intermediate content levelBeginner to intermediate
187 pages
4h 59m
English
O'Reilly Media, Inc.
Content preview from Foundations for Architecting Data Solutions

Chapter 8. Data Processing

Now that we’ve talked through considerations around building data pipelines, we’ll wrap up with a discussion of processing and analyzing all of the data that’s been gathered via those data pipelines. Considerations around collecting, storing, and managing data provide the foundation for any data architecture, but it’s the processing of that data that that will allow you to derive value.

Just as with other components utilized in a distributed data architecture, the challenge with processing is the large number of options available, many of which have different goals and are targeted at different use cases. Like Chapter 5, the goal of this chapter is to provide a list of criteria for categorizing processing systems in order to provide a framework for evaluating them.

Ultimately, the decisions around selection of specific engines will depend on considerations such as your use cases, experience, and knowledge of your team, target users, SLAs, and components used elsewhere in your architecture. Our hope in this chapter is to provide an understanding of where different tools fit in order to allow you to make more informed decisions when planning your projects.

Attributes of Processing Engines

The following are attributes that we’ll use throughout this chapter to distinguish various processing engines:

Directed acyclic graph (DAG) management

How does the engine process an execution plan? We’ll provide more detail on what this means momentarily.

Concurrency ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Data Engineering with AWS

Data Engineering with AWS

Gareth Eagar

Publisher Resources

ISBN: 9781492038733Errata Page