Prometheus: Up & Running, 2nd Edition

Book description

Get up to speed with Prometheus, the metrics-based monitoring system used in production by tens of thousands of organizations. This updated second edition provides site reliability engineers, Kubernetes administrators, and software developers with a hands-on introduction to the most important aspects of Prometheus, including dashboarding and alerting, direct code instrumentation, and metric collection from third-party systems with exporters.

Prometheus server maintainer Julien Pivotto and core developer Brian Brazil demonstrate how you can use Prometheus for application and infrastructure monitoring. This book guides you through Prometheus setup, the Node Exporter, and the Alertmanager, and then shows you how to use these tools for application and infrastructure monitoring. You'll understand why this open source system has continued to gain popularity in recent years.

You will:

  • Know where and how much instrumentation to apply to your application code
  • Monitor your infrastructure with Node Exporter and use new collectors for network system pressure metrics
  • Get an introduction to Grafana, a popular tool for building dashboards
  • Use service discovery and the new HTTP SD monitoring system to provide different views of your machines and services
  • Use Prometheus with Kubernetes and examine exporters you can use with containers
  • Discover Prom's new improvements and features, including trigonometry functions
  • Learn how Prometheus supports important security features including TLS and basic authentication

Publisher resources

View/Submit Errata

Table of contents

  1. Preface
    1. Expanding the Known
    2. The Evolution of Prometheus
    3. Conventions Used in This Book
    4. Using Code Examples
    5. O’Reilly Online Learning
    6. How to Contact Us
    7. Acknowledgments
  2. I. Introduction
  3. 1. What Is Prometheus?
    1. What Is Monitoring?
      1. A Brief and Incomplete History of Monitoring
      2. Categories of Monitoring
    2. Prometheus Architecture
      1. Client Libraries
      2. Exporters
      3. Service Discovery
      4. Scraping
      5. Storage
      6. Dashboards
      7. Recording Rules and Alerts
      8. Alert Management
      9. Long-Term Storage
    3. What Prometheus Is Not
  4. 2. Getting Started with Prometheus
    1. Running Prometheus
    2. Using the Expression Browser
    3. Running the Node Exporter
    4. Alerting
  5. II. Application Monitoring
  6. 3. Instrumentation
    1. A Simple Program
    2. The Counter
      1. Counting Exceptions
      2. Counting Size
    3. The Gauge
      1. Using Gauges
      2. Callbacks
    4. The Summary
    5. The Histogram
      1. Buckets
    6. Unit Testing Instrumentation
    7. Approaching Instrumentation
      1. What Should I Instrument?
      2. How Much Should I Instrument?
      3. What Should I Name My Metrics?
  7. 4. Exposition
    1. Python
      1. WSGI
      2. Twisted
      3. Multiprocess with Gunicorn
    2. Go
    3. Java
      1. HTTPServer
      2. Servlet
    4. Pushgateway
    5. Bridges
    6. Parsers
    7. Text Exposition Format
      1. Metric Types
      2. Labels
      3. Escaping
      4. Timestamps
      5. check metrics
    8. OpenMetrics
      1. Metric Types
      2. Labels
      3. Timestamps
  8. 5. Labels
    1. What Are Labels?
    2. Instrumentation and Target Labels
    3. Instrumentation
      1. Metric
      2. Multiple Labels
      3. Child
    4. Aggregating
    5. Label Patterns
      1. Enum
      2. Info
    6. When to Use Labels
      1. Cardinality
  9. 6. Dashboarding with Grafana
    1. Installation
    2. Data Source
    3. Dashboards and Panels
      1. Avoiding the Wall of Graphs
    4. Time Series Panel
      1. Time Controls
    5. Stat Panel
    6. Table Panel
    7. State Timeline Panel
    8. Template Variables
  10. III. Infrastructure Monitoring
  11. 7. Node Exporter
    1. CPU Collector
    2. Filesystem Collector
    3. Diskstats Collector
    4. Netdev Collector
    5. Meminfo Collector
    6. Hwmon Collector
    7. Stat Collector
    8. Uname Collector
    9. OS Collector
    10. Loadavg Collector
    11. Pressure Collector
    12. Textfile Collector
      1. Using the Textfile Collector
      2. Timestamps
  12. 8. Service Discovery
    1. Service Discovery Mechanisms
      1. Static
      2. File
      3. HTTP
      4. Consul
      5. EC2
    2. Relabeling
      1. Choosing What to Scrape
      2. Target Labels
    3. How to Scrape
      1. metric_relabel_configs
      2. Label Clashes and honor_labels
  13. 9. Containers and Kubernetes
    1. cAdvisor
      1. CPU
      2. Memory
      3. Labels
    2. Kubernetes
      1. Running in Kubernetes
      2. Service Discovery
      3. kube-state-metrics
    3. Alternative Deployments
  14. 10. Common Exporters
    1. Consul
    2. MySQLd
    3. Grok Exporter
    4. Blackbox
      1. ICMP
      2. TCP
      3. HTTP
      4. DNS
      5. Prometheus Configuration
  15. 11. Working with Other Monitoring Systems
    1. Other Monitoring Systems
    2. InfluxDB
    3. StatsD
  16. 12. Writing Exporters
    1. Consul Telemetry
    2. Custom Collectors
      1. Labels
    3. Guidelines
  17. IV. PromQL
  18. 13. Introduction to PromQL
    1. Aggregation Basics
      1. Gauge
      2. Counter
      3. Summary
      4. Histogram
    2. Selectors
      1. Matchers
      2. Instant Vector
      3. Range Vector
      4. Subqueries
      5. Offset
      6. At Modifier
    3. HTTP API
      1. query
      2. query_range
  19. 14. Aggregation Operators
    1. Grouping
      1. without
      2. by
    2. Operators
      1. sum
      2. count
      3. avg
      4. group
      5. stddev and stdvar
      6. min and max
      7. topk and bottomk
      8. quantile
      9. count_values
  20. 15. Binary Operators
    1. Working with Scalars
      1. Arithmetic Operators
      2. Trigonometric Operator
      3. Comparison Operators
    2. Vector Matching
      1. One-to-One
      2. Many-to-One and group_left
      3. Many-to-Many and Logical Operators
    3. Operator Precedence
  21. 16. Functions
    1. Changing Type
      1. vector
      2. scalar
    2. Math
      1. abs
      2. ln, log2, and log10
      3. exp
      4. sqrt
      5. ceil and floor
      6. round
      7. clamp, clamp_max, and clamp_min
      8. sgn
      9. Trigonometric Functions
    3. Time and Date
      1. time
      2. minute, hour, day_of_week, day_of_month, day_of_year, days_in_month, month, and year
      3. timestamp
    4. Labels
      1. label_replace
      2. label_join
    5. Missing Series, absent, and absent_over_time
    6. Sorting with sort and sort_desc
    7. Histograms with histogram_quantile
    8. Counters
      1. rate
      2. increase
      3. irate
      4. resets
    9. Changing Gauges
      1. changes
      2. deriv
      3. predict_linear
      4. delta
      5. idelta
      6. holt_winters
    10. Aggregation Over Time
  22. 17. Recording Rules
    1. Using Recording Rules
    2. When to Use Recording Rules
      1. Reducing Cardinality
      2. Composing Range Vector Functions
      3. Rules for APIs
      4. How Not to Use Rules
    3. Naming of Recording Rules
  23. V. Alerting
  24. 18. Alerting
    1. Alerting Rules
      1. for
      2. Alert Labels
      3. Annotations and Templates
      4. What Are Good Alerts?
    2. Configuring Alertmanagers in Prometheus
      1. External Labels
  25. 19. Alertmanager
    1. Notification Pipeline
    2. Configuration File
      1. Routing Tree
      2. Receivers
      3. Inhibitions
    3. Alertmanager Web Interface
  26. VI. Deployment
  27. 20. Server-Side Security
    1. Security Features Provided by Prometheus
    2. Enabling TLS
    3. Advanced TLS Options
    4. Enabling Basic Authentication
  28. 21. Putting It All Together
    1. Planning a Rollout
    2. Growing Prometheus
    3. Going Global with Federation
    4. Long-Term Storage
    5. Running Prometheus
      1. Hardware
      2. Configuration Management
      3. Networks and Authentication
    6. Planning for Failure
      1. Alertmanager Clustering
      2. Meta- and Cross-Monitoring
    7. Managing Performance
      1. Detecting a Problem
      2. Finding Expensive Metrics and Targets
      3. Reducing Load
      4. Horizontal Sharding
    8. Managing Change
    9. Getting Help
  29. Index
  30. About the Authors

Product information

  • Title: Prometheus: Up & Running, 2nd Edition
  • Author(s): Julien Pivotto, Brian Brazil
  • Release date: April 2023
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781098131142