O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Learning Google BigQuery

Book Description

Get a fundamental understanding of how Google BigQuery works by analyzing and querying large datasets

About This Book

  • Get started with BigQuery API and write custom applications using it
  • Learn how BigQuery API can be used for storing, managing, and query massive datasets with ease
  • A practical guide with examples and use-cases to teach you everything you need to know about Google BigQuery

Who This Book Is For

If you are a developer, data analyst, or a data scientist looking to run complex queries over thousands of records in seconds, this book will help you. No prior experience of working with BigQuery is assumed.

What You Will Learn

  • Get a hands-on introduction to Google Cloud Platform and its services
  • Understand the different data types supported by Google BigQuery
  • Migrate your enterprise data to BigQuery and query it using the legacy and standard SQL techniques
  • Use partition tables in your project and query external data sources and wild card tables
  • Create tables and data sets dynamically using the BigQuery API
  • Perform real-time inserting of records for analytics using Python and C#
  • Visualize your BigQuery data by connecting it to third party tools such as Tableau and R
  • Master the Google Cloud Pub/Sub for implementing real-time reporting and analytics of your Big Data

In Detail

Google BigQuery is a popular cloud data warehouse for large-scale data analytics. This book will serve as a comprehensive guide to mastering BigQuery, and how you can utilize it to quickly and efficiently get useful insights from your Big Data.

You will begin with getting a quick overview of the Google Cloud Platform and the various services it supports. Then, you will be introduced to the Google BigQuery API and how it fits within in the framework of GCP. The book covers useful techniques to migrate your existing data from your enterprise to Google BigQuery, as well as readying and optimizing it for analysis. You will perform basic as well as advanced data querying using BigQuery, and connect the results to various third party tools for reporting and visualization purposes such as R and Tableau. If you're looking to implement real-time reporting of your streaming data running in your enterprise, this book will also help you.

This book also provides tips, best practices and mistakes to avoid while working with Google BigQuery and services that interact with it. By the time you're done with it, you will have set a solid foundation in working with BigQuery to solve even the trickiest of data problems.

Style and Approach

This book follows a step-by-step approach to teach readers the concepts of Google BigQuery using SQL. To explain various data querying processes, large-scale datasets are used wherever required.

Downloading the example code for this book You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the files e-mailed directly to you.

Table of Contents

  1. Preface
    1. What this book covers
    2. What you need for this book
    3. Who this book is for
    4. Conventions
    5. Reader feedback
    6. Customer support
      1. Downloading the example code
      2. Errata
      3. Piracy
      4. Questions
  2. Google Cloud and Google BigQuery
    1. Getting started with Google Cloud
      1. Overviewing Google Cloud Platform services
        1. Google Cloud storage and its features
        2. Learning Google BigQuery
          1. Working with the browser
      2. Running your first query
      3. BigQuery public datasets
      4. Getting started with Cloud SQL 
      5. Cloud Datastore
      6. Google App engine
        1. App engine standard environment
        2. App engine flexible environment
      7. Google container engine
      8. Google compute engine
    2. Summary
  3. Google Cloud SDK
    1. Installing Google Cloud SDK
      1. Installing Google Cloud SDK on Windows
      2. Installing Google Cloud SDK on macOS
      3. Installing Google Cloud SDK on Linux
    2. gsutil for Google Cloud Storage
    3. Using the bq utility for BigQuery
    4. Using the gcloud utility
    5. Connecting to Cloud SQL using gcloud
      1. Authorizing the client machine via Google Cloud Console
      2. Connecting using a proxy script
      3. Exporting Cloud SQL databases and tables
    6. Deploying to Google App Engine
    7. Summary
  4. Google BigQuery Data Types
    1. Supported data types
    2. Data type considerations
    3. Converting data
    4. Sanitizing data
    5. When to transform your data? Before or after loading to BigQuery?
    6. Arithmetic Operators
    7. Comparison Operators
    8. Date Time Functions
    9. String Functions
    10. Regular Expression Functions
    11. Functions for transformation
    12. Mastering transformation with User-Defined Functions
      1. Some considerations when using UDFs
      2. UDF format
    13. Summary
    14. Further Reading
  5. BigQuery SQL Basic
    1. The BigQuery interface
      1. Error checking
      2. Querying in BigQuery
        1. Types of queries
        2. Querying public data
      3. Basic SQL syntax
        1. Commenting in BigQuery SQL
        2. SELECT
        3. FROM 
        4. WHERE 
        5. GROUP BY 
        6. ORDER BY
        7. HAVING 
        8. Qualifying tables in query
        9. DISTINCT
      4. BigQuery SQL functions
        1. WITHIN
        2. OMIT RECORD IF
        3. ROLLUP
      5. Joining tables in BigQuery
        1. Inner join
        2. Left Outer join
        3. Right Outer join
        4. Full Outer join
        5. Cross join
        6. UNION,  UNION ALL, and UNION DISTINCT
      6. Adding your own data in BigQuery
        1. Creating a table
        2. Inserting data to a table
      7. Updating data in a table
        1. Resetting a value
      8. Deleting data from a table
    2. Summary
    3. Further reading
  6. BigQuery SQL Advanced
    1. Partition tables
      1. Creating a partition table using a GUI
      2. Creating a partition table using Google Cloud SDK
      3. Querying data in a partition table
      4. Using partition tables in your projects
    2. Querying external data sources using BigQuery
      1. Creating the table definition
      2. Querying data from external data sources
    3. Wildcard tables
    4. User-defined functions
    5. Views
    6. Querying nested and repeated records
    7. Summary
    8. Further reading
  7. Google BigQuery API
    1. Accessing Google BigQuery
      1. Introducing Google APIs explorer
      2. Getting credentials for API access
        1. Creating a service account
    2. Programming with BigQuery API in C# .NET
      1. Authenticating the service account
      2. Listing all datasets and all tables in the project
      3. Creating a new dataset in the project
      4. Creating a new table within a dataset
      5. Loading data from a file in Google Cloud Storage to a BigQuery table
      6. Executing a query and displaying the result
      7. Executing the query and saving the result in a new table
      8. Streaming insert of rows
    3. Programming with BigQuery API in Python
      1. Listing all datasets and all tables in the project
      2. Creating a new dataset in the project
      3. Creating a new table within a dataset
      4. Importing data from a file in Google Cloud Storage to a BigQuery table
      5. Executing a query and displaying the result
      6. Execute query and copy results to a new table
      7. Streaming insert of rows
    4. Roles and permissions
    5. Summary
  8. Visualizing BigQuery Data
    1. Why is data visualization important?
    2. The danger of summary statistics
    3. Making data visualization work for you
    4. Three tools for visualizing BigQuery data
      1. Simple yet basic – Google Data Studio
        1. Getting started
          1. Making a scatterplot in Data Studio
          2. Making a map in Data Studio
          3. Other features of Data Studio
      2. Simple, fairly flexible, but with a cost – Tableau
        1. Getting started
        2. Map charts in Tableau
        3. Create a word cloud in Tableau
      3. Complex but with considerable flexibility – the R programming language
        1. Getting started
    5. Summary
  9. Google Cloud Pub/Sub
    1. Introduction
    2. Getting started with Cloud Pub/Sub
    3. Cloud Pub/Sub via Google Cloud Console
    4. Cloud Pub/Sub via Google Cloud SDK
    5. Cloud Pub/Sub pricing
    6. Message output formats
    7. Importing message data into BigQuery
    8. Google Cloud Dataprep
    9. Summary
    10. Further reading