Preface
In my current role at Google, I get to work alongside data scientists and data engineers in a variety of industries as they move their data processing and analysis methods to the public cloud. Some try to do the same things they do on premises, the same way they do them, just on rented computing resources. The visionary users, though, rethink their systems, transform how they work with data, and thereby are able to innovate faster.
As early as 2011, an article in Harvard Business Review recognized that some of cloud computing’s greatest successes come from allowing groups and communities to work together in ways that were not previously possible. This is now much more widely recognized. An MIT survey in 2017 found that more respondents (45%) cited increased agility rather than cost savings (34%) as the reason to move to the public cloud. However, it is still not widely achieved. McKinsey estimated in 2021 that companies are leaving behind nearly $1 trillion of value by not looking at the public cloud as a source of transformative value. Therefore, being able to work on a data science project in the cloud is a skill well worth investing in.
In this book, we walk through an example of a cloud-native, transformative, collaborative way of doing data science. You will learn how to implement an end-to-end data pipeline—we will begin with ingesting the data in a serverless way and work our way through data exploration, dashboards, relational databases, and streaming data all ...