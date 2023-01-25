Book description
Learn how easy it is to apply sophisticated statistical and machine learning methods to real-world problems when you build using Google Cloud Platform (GCP). This hands-on guide shows data engineers and data scientists how to implement an end-to-end data pipeline, using statistical and machine learning methods and tools on GCP.
Through the course of this updated second edition, you'll work through a sample business decision by employing a variety of data science approaches. Follow along by implementing these statistical and machine learning solutions in your own project on GCP, and discover how this platform provides a transformative and more collaborative way of doing data science.
You'll learn how to:
- Employ best practices in building highly scalable data and ML pipelines on Google Cloud
- Automate and schedule data ingest using Cloud Run
- Create and populate a dashboard in Data Studio
- Build a real-time analytics pipeline using Pub/Sub, Dataflow, and BigQuery
- Conduct interactive data exploration with BigQuery
- Create a Bayesian model with Spark on Cloud Dataproc
- Forecast time series and do anomaly detection with BigQuery ML
- Aggregate within time windows with Dataflow
- Train explainable machine learning models with Vertex AI
- Operationalize ML with Vertex AI Pipelines
1. Making Better Decisions Based on Data
- Many Similar Decisions
- The Role of Data Scientists
- Best Practices
- A Probabilistic Decision
- Data and Tools
- Summary
2. Ingesting Data into the Cloud
- Airline On-Time Performance Data
- Separation of Compute and Storage
- Ingesting Data
- Loading Data into Google BigQuery
- Scheduling Monthly Downloads
- Summary
- Code Break
3. Creating Compelling Dashboards
- Explain Your Model with Dashboards
- Loading Data into Cloud SQL
- Querying Using BigQuery
- Building Our First Model
- Building a Dashboard
- Summary
4. Streaming Data: Publication and Ingest with Pub/Sub and Dataflow
- Designing the Event Feed
- Time Correction
- Publishing an Event Stream to Cloud Pub/Sub
- Real-Time Stream Processing
- Real-Time Dashboard
- Summary
- Title: Data Science on the Google Cloud Platform, 2nd Edition
- Author(s):
- Release date: January 2023
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781098118938
