Learn how to leverage Kubernetes for efficient auto-scaling in micro-service architectures. This blog explores Horizontal Pod Autoscaling (HPA) using internal and external metrics, including the setup of Prometheus Adapter for monitoring external Redis servers. Follow our step-by-step guide to optimize resource utilization and ensure minimal downtime in your Kubernetes deployments.
Orchestrators manage modern-day micro-service architectures. Kubernetes is one of them, which provides benefits of resource optimization, minimal or zero downtime deployments, reliability, auto-scaling, to name a few. Auto-scaling solutions are feedback loop based on specific metrics like network throughput, resource utilization of the services. Generally, metrics can be traffic throughput, resource utilization like CPU/Memory of the services. These metrics are part of the cluster and monitored to take auto-scaling decisions, but what about the external metrics? This blog covers both kinds of metrics for deploying the auto-scaling solution and used in production for a client.
One of our clients was using a Redis server which was outside of the Kubernetes cluster. We had to collect the metrics of the Redis queues and based on threshold auto-scale the pods.
What is Horizontal Pod Autoscaling (HPA)?
Kubernetes is inherently scalable, providing a number of tools that allow the applications as well as the infrastructure to scale up and down depending on the demand, efficiency and a number of other metrics. What I’m going to discuss in this article, is one such feature that allows the user to horizontally scale the Pods based on certain metrics, which can either be provided by Kubernetes itself, or custom metrics which have been generated by the user.
The Horizontal Pod Autoscaler automatically scales the number of Pods in a replication controller, deployment, replica set or stateful set based on some metrics. It is implemented as a Kubernetes API resource and a controller.
The HPA controller retrieves metrics from a series of APIs, which include:
metrics.k8s.io
API for resource metricsThese include metrics like cpu/memory usage of a Pod.
custom.metrics.k8s.io
API for custom metricsThese can be defined by using operators and are generated from within the cluster, for example Prometheus Operator.
external.metrics.k8s.io
API for external metricsThese metrics originate from outside the Kubernetes cluster, for example number of pending jobs present in a external Redis queue, and have to made available to the cluster so that the HPA controller can monitor it.
How are we going to implement HPA?
For this article, we will be using the Prometheus Adapter in order to have the Prometheus metric available to the Kubernetes cluster as an external metric.
The following steps outline how HPA can be implemented in the cluster:
There will be an application running in the cluster, which connects to the external Redis service to pick up the next job from the queue.
After picking up the job from the queue, the application will send to StatsD the number of pending jobs remaining in the queue as a gauge metric.
The external Prometheus will scrape StatsD and now has the metric.
For example, below is the metric that will be used to trigger the autoscaling event:
Now, you can deploy the Prometheus Adapter in the cluster to query the external Prometheus and expose the metric to the cluster via the external metric API.
The manifests for deploying the Prometheus Adapter can be found here.
The following changes are to be made to the Prometheus Adapter’s manifests:
Update the URL for the Prometheus service in the
deploy/manifests/custom-metrics-apiserver-deployment.yaml
Update the
deploy/manifests/custom-metrics-config-map.yaml
with the correct rule for querying Prometheus.For example, our metric is named
trigger_prod_hpa
, which has the labels{instance="sh119.global.temp.domains/~onetwoni",job="trigger_prod_hpa"}
.The corresponding Prometheus Adapter rule for the above metric would be:
The git repository for the Prometheus Adapter contains a very detailed explanation on how the adapter rules are written.
TLS certificates will have to be generated for the Prometheus Adapter.
This Makefile can be used to generate the certs.
Run the following commands to generate the manifest which will contain the certs as a k8s secret:
The above commands will generate a yaml file which has the secret configured.
Copy the generated
cm-adapter-serving-certs.yaml
to Prometheus Adapter’sdeploy/manifests
directory.Note: Make sure that the namespace of the generated secret is the same as the namespace for the manifests in Prometheus Adapter’s
deploy/manifests
.
After the adapter has been successfully deployed, we now have to confirm that the adapter configuration is applied correctly:
Confirm that the
external.metrics.k8s.io
API is active and aware of the metric:kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq .
Next, confirm that the metric’s value from Prometheus is correctly available to the cluster:
kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/namespaces/*/trigger_prod_hpa" | jq .
Now that we have confirmed that our external metric is available to the cluster, we are ready to define the HPA configuration which will have:
threshold value to trigger the autoscaling event, which is compared against the external metric
minimum number of Pods that must be running when the value is below the threshold
maximum number of Pods that can be scaled up to when the value crosses the threshold
Below is the HPA configuration yaml:
hpa.yaml
The above configuration can be applied by the following command:
kubectl apply -f hpa.yaml
We can check the applied HPA configuration by running the following command:
kubectl describe hpa -n <namespace>
As can be seen above, the HPA configuration has been applied to the cluster and the HPA controller is able to access the external metric correctly. It will monitor the value of the external metric to the threshold’s value, and when it crosses the threshold, will trigger a scale up action. Similarly, when the external metric’s value goes below the threshold, the HPA controller will trigger a scale down action.
The HPA controller keeps a track of desired number of Pods based on the following formula:
Test the HPA
In order to test our HPA configuration and make sure that the scaling up/down occurs correctly, we will update the value of the trigger_prod_hpa
metric to a value above the threshold.
Update the value of the
trigger_prod_hpa
metric at Prometheus to a value above “40” (threshold we have set). Let’s set the value at “50”, and after the scale up event has happened, let’s update it to “30”As we can see below, the metric value has been updated and the HPA controller has already started scaling up our Pods till the maximum allowed number:
When the value of the
trigger_prod_hpa
metric eventually falls below the threshold, the HPA controller will start scaling down the Pods based on the formula mentioned above:
It is possible to track the number of replicas you will end up with after scaling down:
As can be seen, HPA is now maintaining 3 replicas:
Conclusion
We were able to achieve Horizontal Pod Autoscaling feature of Kubernetes, by ingesting the external metrics to the cluster, define HPA configuration and let Kubernetes handle auto-scale the pods.
Always set the maximum and a minimum number of replicas. There can be situations where the maximum replicas are not enough or minimum replicas are more than desired.