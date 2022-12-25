Managing Cloud Native Data on Kubernetes

by Jeff Carpenter, Patrick McFadin
Released December 2022
Publisher(s): O'Reilly Media, Inc.
ISBN: 9781098111397

Book description

Kubernetes has become the primary platform for deploying and managing cloud native applications. But because it was originally designed for stateless workloads, working with data on Kubernetes has been challenging. If you want to avoid the inefficiencies and duplicative costs of having separate infrastructure for applications and data, this practical guide can help.

Using Kubernetes as your platform, you'll discover open source technologies that are designed and built for the cloud. Delve into case studies to avoid the pitfalls others have faced and explore new use cases. Get an insider's view of what's coming from the innovators who are creating next-generation architectures and infrastructure. And you'll learn how to:

  • Manage different data use cases on Kubernetes
  • Reduce costs and simplify application development
  • Leverage data and infrastructure to create new use cases and business models
  • Make data infrastructure choices that are cost-efficient, secure, scalable, and elastic
  • And more

Table of contents

  1. 1. Introduction to Cloud Native Data Infrastructure: Persistence, Streaming, and Batch Analytics
    1. Infrastructure Types
    2. What is Cloud Native Data?
    3. More Infrastructure, More Problems
    4. Kubernetes Leading the Way
      1. Managing Compute on Kubernetes
      2. Managing Network on Kubernetes
      3. Managing Storage on Kubernetes
    5. Cloud native data components
    6. Looking forward
    7. Getting ready for the revolution
      1. Adopt an SRE mindset
      2. Embrace Distributed Computing
    8. Summary
  2. 2. Managing Data Storage on Kubernetes
    1. Docker, Containers, and State
      1. Managing State in Docker
      2. Bind mounts
      3. Volumes
      4. Tmpfs Mounts
      5. Volume Drivers
    2. Kubernetes Resources for Data Storage
      1. Pods and Volumes
      2. PersistentVolumes
      3. PersistentVolumeClaims
      4. StorageClasses
    3. Kubernetes Storage Architecture
      1. Flexvolume
      2. Container Storage Interface (CSI)
      3. Container Attached Storage
      4. Container Object Storage Interface (COSI)
    4. Summary
  3. 3. Databases on Kubernetes the Hard way
    1. The Hard Way
    2. Prerequisites for running data infrastructure on Kubernetes
    3. Running MySQL on Kubernetes
      1. ReplicaSets
      2. Deployments
      3. Services
      4. Accessing MySQL
    4. Running Apache Cassandra on Kubernetes
      1. StatefulSets
      2. Accessing Cassandra
    5. Summary
  4. 4. Automating Database Deployment on Kubernetes with Helm
    1. Deploying Applications with Helm charts
      1. Using Helm to deploy MySQL
      2. Using Helm to deploy Apache Cassandra
      3. Helm Limitations
    2. Summary
  5. 5. Automating Database Management on Kubernetes with Operators
    1. Extending the Kubernetes Control Plane
      1. Extending Kubernetes Clients
      2. Extending Kubernetes Control Plane Components
      3. Extending Kubernetes Worker Node Components
    2. The Operator Pattern
      1. Controllers
      2. Events
      3. Custom Resources
      4. Operators
    3. Managing MySQL in Kubernetes using the Vitess Operator
      1. Vitess Overview
      2. PlanetScale Vitess Operator
    4. A Growing Ecosystem of Operators
      1. Choosing Operators
    5. Summary
  6. 6. Integrating Data Infrastructure in a Kubernetes Stack
    1. K8ssandra: Production-ready Cassandra on Kubernetes
      1. K8ssandra Architecture
      2. Installing the K8ssandra Operator
      3. Creating a K8ssandraCluster
    2. Managing Cassandra in Kubernetes with Cass Operator
      1. Enabling Developer Productivity with Stargate APIs
      2. Unified Monitoring Infrastructure with Prometheus and Grafana
      3. Performing Repairs with Cassandra-Reaper
      4. Backing up and Restoring Data with Cassandra Medusa
      5. Deploying Multi-cluster applications in Kubernetes
    3. Summary
  7. 7. The Kubernetes Native Database
    1. Why a Kubernetes native approach is needed
    2. Hybrid data access at scale with TiDB
      1. TiDB Architecture
    3. Serverless Cassandra with DataStax Astra DB
    4. What to look for in a Kubernetes Native Database
      1. Basic requirements
      2. The Future of Kubernetes Native
    5. Summary
  8. 8. Streaming Data on Kubernetes
    1. Introduction to Streaming
      1. Types of delivery
      2. Delivery Guarantees
      3. Feature scope
    2. The Role of Streaming in Kubernetes
    3. Streaming on Kubernetes with Apache Pulsar™
      1. Preparing Your Environment
      2. Securing Communications by Default with Cert-manager
      3. Using Helm to Deploy Apache Pulsar™
    4. Stream Analytics with Apache Flink™
      1. Deploying Apache Flink™ on Kubernetes
    5. Summary
  9. 9. Data Analytics on Kubernetes
    1. Introduction to Analytics
    2. Deploying Analytic Workloads in Kubernetes
    3. Introduction to Apache Spark™
    4. Deploying Apache Spark™ in Kubernetes
    5. Kubernetes Operator for Apache Spark
    6. Alternative Schedulers for Kubernetes
      1. Apache YuniKorn™
      2. Volcano
    7. Analytic Engines for Kubernetes
      1. Dask
      2. Ray
    8. Summary
  10. 10. Machine Learning and Other Emerging Use Cases for Data on Kubernetes
    1. The Cloud Native AI/ML Stack
      1. AI/ML Definitions
      2. Defining an AI/ML Stack
      3. Real-time Model Serving with KServe
      4. Full-Lifecycle Feature Management with Feast
    2. Efficient Data Movement with Apache Arrow
    3. Versioned Object Storage with LakeFS
    4. Summary

