Chapter 3. Autoscaling Knative Services

Serverless-style architecture is not only about terminating your services when they are not in use but also about scaling them up based on demand. Knative handles these requirements effectively using its scale-to-zero and autoscaling capabilities:


After a time of idleness your Knative Serving Service’s Revision is considered to be inactive. Knative will terminate all the pods that correspond to that inactive Revision, and the Routes for that inactive Revision will be mapped to Knative Serving’s activator service. The activator becomes the endpoint for receiving and buffering your end-user’s HTTP traffic, to allow for the autoscaler—that is, the Knative Service’s ability to scale from zero to n pods—to do its job.


Autoscaling is the ability for the Knative Service to scale out its pods based on inbound HTTP traffic. The autoscaling feature of Knative is managed by:

  • Knative Horizontal Pod Autoscaler (KPA)

  • Horizontal Pod Autoscaler (HPA); the default autoscaler built into Kubernetes

The HPA relies on three important metrics: concurrency, requests per second, and cpu. The KPA can be thought of as an extended version of the HPA with a few tweaks to the default HPA algorithms to make it more suited to handle the more dynamic and load-driven Knative scaling requirements.


With our current setup of a Kubernetes cluster with minikube, which is a smaller cluster with limited resources, it is easy to demonstrate ...

Get Knative Cookbook now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.