O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Cloud Analytics with Google Cloud Platform

Book Description

Combine the power of analytics and cloud computing for faster and efficient insights

About This Book
  • Master the concept of analytics on the cloud: and how organizations are using it
  • Learn the design considerations and while applying a cloud analytics solution
  • Design an end-to-end analytics pipeline on the cloud
Who This Book Is For

This book is targeted at CIOs, CTOs, and even analytics professionals looking for various alternatives to implement their analytics pipeline on the cloud. Data professionals looking to get started with cloud-based analytics will also find this book useful. Some basic exposure to cloud platforms such as GCP will be helpful, but not mandatory.

What You Will Learn
  • Explore the basics of cloud analytics and the major cloud solutions
  • Learn how organizations are using cloud analytics to improve the ROI
  • Explore the design considerations while adopting cloud services
  • Work with the ingestion and storage tools of GCP such as Cloud Pub/Sub
  • Process your data with tools such as Cloud Dataproc, BigQuery, etc
  • Over 70 GCP tools to build an analytics engine for cloud analytics
  • Implement machine learning and other AI techniques on GCP
In Detail

With the ongoing data explosion, more and more organizations all over the world are slowly migrating their infrastructure to the cloud. These cloud platforms also provide their distinct analytics services to help you get faster insights from your data.

This book will give you an introduction to the concept of analytics on the cloud, and the different cloud services popularly used for processing and analyzing data. If you're planning to adopt the cloud analytics model for your business, this book will help you understand the design and business considerations to be kept in mind, and choose the best tools and alternatives for analytics, based on your requirements. The chapters in this book will take you through the 70+ services available in Google Cloud Platform and their implementation for practical purposes. From ingestion to processing your data, this book contains best practices on building an end-to-end analytics pipeline on the cloud by leveraging popular concepts such as machine learning and deep learning.

By the end of this book, you will have a better understanding of cloud analytics as a concept as well as a practical know-how of its implementation

Style and approach

Comprehensive guide with a perfect blend of theory, examples, and implementation of real-world use-cases

Downloading the example code for this book You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the files e-mailed directly to you.

Table of Contents

  1. Title Page
  2. Copyright and Credits
    1. Cloud Analytics with Google Cloud Platform
  3. Packt Upsell
    1. Why subscribe?
    2. PacktPub.com
  4. Foreword
  5. Contributors
    1. About the author
    2. About the reviewers
    3. Packt is searching for authors like you
  6. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
      1. Download the color images
      2. Conventions used
    4. Get in touch
      1. Reviews
  7. Introducing Cloud Analytics
    1. What is cloud computing?
    2. Major benefits of cloud computing
    3. Cloud computing deployment models
      1. Private cloud
      2. Public cloud
      3. Hybrid cloud
      4. Differences between the private cloud, hybrid cloud, and public cloud models
    4. Types of cloud computing services
      1. Infrastructure as a Service
      2. PaaS
      3. SaaS
      4. Differences between SaaS, PaaS, and IaaS
    5. How PaaS, IaaS, and SaaS are separated at service level
    6. Emerging cloud technologies and services
      1. Different ways to secure the cloud
    7. Risks and challenges with the cloud
    8. What is cloud analytics?
    9. 10 major cloud vendors in the world
    10. Google Cloud Platform introduction—video
    11. Summary
  8. Design and Business Considerations
    1. A bit more about cloud computing and migration
    2. Parameters before adopting cloud strategy
      1. Developing and changing business needs
      2. Security of data
      3. Organizational requests on the in-house IT team
      4. Cloud deployment models—public cloud, private cloud, and hybrid cloud
      5. Legally binding responsibilities
    3. Prerequisites for an application to be moved to the cloud
      1. Performance
      2. Portability
      3. Simplifying cloud migration with virtualization
    4. Infrastructure contemplation for cloud
    5. Available deployment models while moving to cloud
      1. IaaS
        1. Advantages of IaaS
        2. Disadvantages of IaaS
      2. PaaS
        1. Advantages of PaaS
        2. Disadvantages of PaaS
      3. SaaS
    6. Cloud migration checklist
    7. Architecture of a cloud computing ecosystem
      1. Infrastructure for cloud computing
      2. Constrictions on cloud infrastructure
    8. Applications of cloud computing
    9. Preparing a plan for moving to cloud computing
      1. Methodology stage
        1. Cloud computing proposal worth
        2. Cloud computing methodology planning
      2. Planning stage
      3. Distribution stage
      4. Making arrangements for a multi-provider methodology
      5. Making a multi-provider design tactic
    10. Technologies utilized by cloud computing
      1. Grid computing
      2. Service-oriented architecture
      3. Virtualization
      4. Utility computing
    11. Summary
  9. GCP 10,000 Feet Above – A High-Level Understanding of GCP
    1. Different services offered by typical cloud vendors
    2. Understanding cloud categories
      1. Compute
        1. Compute Engine
        2. App engine
        3. Kubernetes engine
        4. Cloud function
      2. Storage and databases
        1. Cloud storage
        2. Cloud SQL
        3. Cloud Bigtable
        4. Cloud Spanner
        5. Cloud Datastore
        6. Persistent Disk
      3. Networking
        1. Virtual Private Cloud
        2. Cloud load balancing
        3. Cloud CDN
        4. Cloud interconnect
        5. Cloud DNS
        6. Network Service Tiers ALPHA
      4. Big Data
        1. BigQuery
        2. Cloud Dataflow
        3. Cloud Dataproc
        4. Cloud Datalab
        5. Cloud Dataprep BETA
        6. Cloud Pub/Sub
        7. Genomics
        8. Google Data Studio BETA
      5. Data transfer
        1. Google Transfer appliance
        2. Cloud Storage Transfer Service
        3. Google BigQuery Data Transfer Service
      6. Cloud AI
        1. Cloud AutoML alpha
        2. Cloud TPU beta
        3. Cloud machine learning engine
        4. Cloud job discovery private beta
        5. Dialogflow enterprise edition beta
        6. Cloud natural language
        7. Cloud speech API, translation API, and vision API
        8. Cloud video intelligence
      7. Internet of Things
        1. Cloud IoT Core beta
      8. Management tools
        1. Stackdriver overview
        2. Monitoring, logging, error reporting, trace, and debugger
        3. Cloud deployment manager
        4. Cloud console
        5. Cloud shell
        6. Cloud console mobile app
      9. Developer tools
        1. Cloud SDK
        2. Container Registry
        3. Container builder
        4. Cloud source repositories
        5. Cloud tools for IntelliJ, Visual Studio, and Eclipse
        6. Cloud tools for Powershell
    3. Overview to Google Cloud Platform Console—Video
    4. Summary
  10. Ingestion and Storing – Bring the Data and Capture It
    1. Cloud Dataflow
      1. When to use
      2. Special features
      3. The Dataflow programming model
        1. Pipelines
        2. PCollection (data)
        3. Transforms
        4. I/O sources and sinks
      4. Pipeline example
      5. How to use Cloud Dataflow - Video
    2. Cloud Pub/Sub
      1. When to use
      2. Special feature
      3. Overview
      4. Using the gcloud command-line tool
      5. How to use Cloud Pub Sub - Video
    3. Cloud storage
      1. When to use it
      2. Special feature
      3. Cloud storage classes
        1. Multi-regional storage
        2. Regional storage
        3. Nearline storage
        4. Coldline storage
        5. Standard storage
      4. Working with storages
      5. How to use Cloud Storage - Video
    4. Cloud SQL
      1. When to use
      2. Special feature
      3. Database engine (MySQL)
      4. Database engine (PostgreSQL)
      5. How to use Cloud SQL - Video
    5. Cloud BigTable
      1. When to use it
      2. Special features
      3. Cloud BigTable storage model
      4. Cloud Bigtable architecture
      5. Load balancing
      6. How to use Cloud Bigtable—Video
    6. Cloud Spanner
      1. When to use
      2. Special features
      3. Schema and data model
      4. Instances
      5. How to use Cloud Spanner - Video
    7. Cloud Datastore
      1. When to use
      2. Special features
      3. How to use Cloud Datastore - Video
    8. Persistent disks
      1. When to use
      2. Special feature
        1. Standard hard disk drive
        2. Solid-state drives
        3. Persistent disk
      3. How to attache Persistent Store to VM - Video
    9. Summary
  11. Processing and Visualizing – Close Encounter
    1. Google BigQuery
      1. Storing data in BigQuery
      2. Features of BigQuery
      3. Choosing a data ingestion format
        1. Schema type of the data
        2. External limitations
        3. Embedded newlines
      4. Supported data formats
        1. Google Cloud Storage
        2. Readable data source
      5. Use case
      6. How to use Google BigQuery - Video
    2. Cloud Dataproc
      1. When to use it
      2. Features of Dataproc
        1. Super-fast to build the cluster
        2. Low cost
        3. Easily integrated with other components
      3. Available versions and supported components of Cloud DataProc
      4. Accessibility of Google Cloud Dataproc
      5. Placement of Dataproc
      6. Dataflow versus Dataproc
      7. Pricing
      8. How to use Cloud Dataproc - Video
    3. Google Cloud Datalab
      1. Features of Cloud DataLab
        1. Multi-language support
        2. Integration with multiple Google services
        3. Interactive data visualization
        4. Machine learning
      2. Use case
      3. How to use Google Cloud Datalab - Video
    4. Google Data Studio
      1. Features of Data Studio
        1. Data connections
        2. Data visualization and customization
        3. Usability
        4. Data transformation
        5. Sharing and collaboration
        6. Report templates
        7. Report customization
      2. The flow of Data Studio
      3. How to use Google Data Studio - Video
    5. Google Compute Engine
      1. Features
      2. Advantages of Compute Engine
        1. Batch processing
        2. Predefined machine types
        3. Persistent disks
        4. Linux and Windows support
        5. Per-second billing
      3. Types of Compute Engine
        1. Quickstart VM
        2. Custom VM
        3. Preemptible VM
      4. Use case
      5. How to use Google Compute Engine - Video
    6. Google App Engine
      1. Characteristics of flexible and standard environments
      2.  Google AppEngine architecture
      3. Features
        1. Multiple language support
        2. Application versioning
        3. Fully managed
        4. Application security
        5. Traffic splitting
      4. Use case
      5. How to use Google App Engine - Video
    7. Google Container Engine
      1. Container cluster architecture
        1. Cluster master
        2. Cluster master and the Kubernetes API
      2. Master and node interaction
        1. Nodes
        2. Node machine type
      3. How to use Google Container Engine - Video
    8. Google Cloud Functions
      1. Connecting and extending cloud services
      2. Functions are serverless
      3. Use cases
        1. IoT
        2. Data processing ETL
        3. Mobile backend
      4. How to use Google Cloud Functions - Video
    9. Summary
  12. Machine Learning, Deep Learning, and AI on GCP
    1. Artificial intelligence
    2. Machine learning
    3. Google Cloud Platform
    4. Google Cloud Machine Learning Engine
      1. Pricing
    5. Cloud Natural Language API
      1. Use Cases
        1. Using the goodbooks data set from GitHub
        2. Using GCP services list and classify text based on categories
        3. State choice management
      2. How to use Natural Language API - Video
    6. TensorFlow
      1. Use case—text summarization
    7. Cloud Speech API
      1. How to use Speech API - Video
    8. Cloud Translation API
      1. Use cases
        1. Rule-based Machine Translation
        2. Local tissue response to injury and trauma
      2. How to use Translation API - Video
    9. Cloud Vision API
      1. Use cases
        1. Image detection using Android or iOS mobile device
        2. Retinal Image Analysis – ophthalmology
      2. How to use Vision API - Video
    10. Cloud Video Intelligence
    11. Dialogflow
      1. Use cases
        1. Interactive Voice Response System customer service
        2. Checkout free shopping
    12. AutoML
      1. Use case – Listening to music by fingerprinting
    13. Summary
  13. Guidance on Google Cloud Platform Certification
    1. Professional Cloud Architect Certification
      1. Topics for cloud architect certification
        1. Cloud virtual network
        2. Google Compute Engine
        3. Cloud IAM
        4. Data Storage Services
        5. Resource management and resource monitoring
        6. Interconnecting network and load balancing
        7. Autoscaling
        8. Infrastructure automation with Cloud API and Deployment Manager
        9. Managed services
        10. Application infra services
        11. Application development services
        12. Containers
      2. Job role description
      3. Certification preparation
      4. Sample questions
      5. Use cases
    2. Professional Data Engineer Certification
      1. Topics for Cloud Data Engineer Certification
        1. BigQuery
        2. Dataflow
        3. Dataproc
        4. Machine Learning API and TensorFlow
        5. Stream Pipeline, Streaming Analytics, and Dashboards
      2. Job role description
      3. Certification preparation
      4. Sample questions
      5. Use cases
    3. When to use What
      1. Choosing Cloud Storage
      2. Choosing Cloud SQL
      3. Choosing Cloud Spanner
      4. Choosing DataStore
      5. Choosing BigTable
      6. Choosing right data storage
      7. Dataproc versus Dataflow
      8. Data Peering versus Carrier Peering versus IPSec VPN versus Dedicated Interconnect
    4. Summary
  14. Business Use Cases
    1. Smart Parking Solution by Mark N Park
      1. Abstract
      2. Introduction
      3. Problems
      4. Brainstorming
        1. Collection of sensor data in real time
        2. Updating the right dataset/database
        3. Storing periodic data
        4. Transmitting the data to the end user
        5. Reports and dashboard output required
        6. Scaling infrastructure
      5. Services
      6. Architecture
      7. Conclusion
    2. DSS for web mining recommendation using TensorFlow
      1. Abstract
      2. Introduction
      3. Problems
      4. Brainstorming
        1. Internet bandwidth
        2. Local systems or mobile hardware configuration
        3. Collection of data in real time
        4. Updating the right database
        5. Storing periodic data
        6. Extracting the data to the end user
        7. Report generation as per requirements of the end user
        8. Scaling of infrastructure
      5. Services
      6. Architecture
      7. Advantages of using TensorFlow
      8. Limitations of TensorFlow
      9. Conclusion
    3. Building a Data Lake for a Telecom Client
      1. Abstract
      2. Introduction
      3. Problems
      4. Brainstorming
        1. Challenges from phase 1
          1. Identify source type (Batch or RDBMS or Stream)
          2. Logging was carried out and logs were created for every file that was covered
          3. RDBMS import to create files automatically for MySQL and Postgres
          4. Ingesting live data into GCP
          5. Different destinations for all the data sources
          6. Code repository
        2. Challenges from phase 2
          1. Building Hadoop cluster
          2. Data ingestion prioritization and then ingestion
          3. Building strict policies between Data Lake and Hadoop cluster users
          4. Maintaining high availability, enabled load balancer, auto scaled, and secured cluster
          5. Maintaining cluster health
          6. Alpha phase is bringing data from the Data Lake into an application cluster
          7. Beta phase includes cleaning of data
          8. Gamma phase performs transformation
          9. Delta phase graphs and reports are generated on multiple BI tools
          10. Code repository
      5. Services
      6. Architecture
      7. Conclusion
    4. Summary
  15. Introduction to AWS and Azure
    1. Amazon Web Services
      1. Compute
      2. Storage
      3. Database
      4. Networking and content
      5. Developer tools
      6. Management tools
      7. Machine learning
      8. Analytics
      9. Security, identity, and compliance
      10. Internet of Things
      11. Migration
      12. Other services
      13. Overview to AWS Services
    2. Microsoft Azure
      1. Compute
      2. Networking
      3. Storage
      4. Web and mobile
      5. Containers
      6. Databases
      7. Analytics
      8. AI and machine learning
      9. Internet of Things
      10. Security and Identity
      11. Developer Tools
      12. Management Tools
      13. Overview to Azure Services
    3. Head to head of Google Cloud Platform with Amazon Web Services and Microsoft Azure
      1. Compute
      2. Storage
      3. Database
      4. Analytics and big data
      5. Internet of Things
      6. Mobile Services
      7. Application Services
      8. Networking
      9. Security and Identity
      10. Monitoring and Management
    4. Summary
  16. Other Books You May Enjoy
    1. Leave a review - let other readers know what you think