Services

Resources

Company

Our Work

Blog

Book a Call

Back to Blog

#Kubernetes

#Scalability

#SRE

Nov 8, 2024 | 7 min read

Karpenter vs Cluster Autoscaler: a comprehensive guide to Kubernetes Node Scaling

Rahul H Bhatia

SRE

Back to Blog

#Kubernetes

#Scalability

#SRE

Nov 8, 2024 | 7 min read

Karpenter vs Cluster Autoscaler: a comprehensive guide to Kubernetes Node Scaling

Rahul H Bhatia

SRE

Back to Blog

#Kubernetes

#Scalability

#SRE

Nov 8, 2024 | 7 min read

Karpenter vs Cluster Autoscaler: a comprehensive guide to Kubernetes Node Scaling

Rahul H Bhatia

SRE

Back to Blog

#Kubernetes

#Scalability

#SRE

Nov 8, 2024 | 7 min read

Karpenter vs Cluster Autoscaler: a comprehensive guide to Kubernetes Node Scaling

Rahul H Bhatia

SRE

In cloud-native environments, Kubernetes (K8s) has become essential for managing containerized applications at scale. To handle fluctuating workloads efficiently, autoscaling is key. Tools like Karpenter and Cluster Autoscaler (CA) enable Kubernetes to respond dynamically to workload needs by adjusting cluster resources. Each tool offers unique scaling capabilities and flexibility, making it crucial to understand their distinctions, use cases, and configurations. Here’s an in-depth comparison to help identify which tool best suits your Kubernetes cluster.

Cluster autoscaler

Purpose:
The Cluster Autoscaler automatically adjusts the node count in your Kubernetes cluster, scaling up when workloads need additional resources and scaling down when demand decreases.

Key point:
CA works with predefined node groups (collections of virtual machines) and scales according to pod demand, which suits more predictable workload environments.

Sample configuration (Priority expander):
A priority expander configuration helps CA determine which node groups to scale based on assigned priorities:

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-autoscaler-priority-expander
  namespace: kube-system
data:
  priorities: |
    10:
      - "node-group-small"      # Highest priority
    20:
      - "node-group-medium"     # Second priority
    50:
      - "node-group-large"      # Lower priority
    100:
      - "*"                     # Catch-all, lowest priority for other node groups

This example sets priority levels for each node group, allowing CA to scale according to preset rules and resource availability.

Karpenter

Purpose:
Karpenter provides more flexible and optimized scaling than CA. It dynamically adjusts not only the number of nodes but also the types, selecting the best match for specific workloads based on real-time needs.

Key Point:
Karpenter is suited for dynamic environments where workload demands vary significantly. It allows you to utilize various instance types (including spot instances) for cost savings and flexibility, unlike CA, which depends on predefined node groups.

Sample Configuration (Provisioner):
The following provisioner configuration allows Karpenter to scale flexibly:

apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
  name: configuration
spec:
  consolidation:
    enabled: true
  limits:
    resources:
      cpu: '20'
      memory: 32Gi
  provider:
    launchTemplate: <Your EC2 Launch Template Name>
    tags:
      Name: karpenter.sh/provisioner/configuration
      karpenter.sh/provisioner-name: configuration
      nodegroup: configuration
  requirements:
    - key: karpenter.sh/capacity-type
      operator: In
      values:
        - on-demand
        - spot
    - key: node.kubernetes.io/instance-type
      operator: In
      values:
        - <Instance Type>
    - key: kubernetes.io/arch
      operator: In
      values:
        - amd64
    - key: kubernetes.io/os
      operator: In
      values

With this setup, Karpenter can decide on instance types and resource allocations based on workload requirements, offering the ability to switch between on-demand and spot instances as necessary.

Comparison: how Karpenter and Cluster Autoscaler work

Cluster Autoscaler (CA):

Mechanism: CA directly works with EKS-managed node groups, scaling according to predefined groups set by a configuration map. It relies on Kubernetes Scheduler signals and adjusts nodes based on the resources available for pods, adding nodes if necessary or removing underutilized ones.
Scaling Methodology: The priority expander configuration in CA controls scaling decisions based on node group priorities.

Karpenter:

Mechanism: Karpenter doesn’t depend on EKS-managed node groups. Instead, it uses the EC2 fleet API to create instances as needed. A provisioner in Karpenter defines instance requirements, which allows Karpenter to select the appropriate instance types on demand.
Scaling Methodology: Karpenter dynamically provisions instances based on labels and node affinity, accommodating specific pod requirements by calling the EC2 fleet API for more fine-grained control over instance selection.

How each tool handles scaling in and out

Cluster Autoscaler:
CA relies on priority values in the expander config map, which uses regular expressions to match node groups and scale accordingly. The highest priority group will scale first, adding nodes when the Kubernetes Scheduler cannot find a suitable node for pods, and removing them when resources are underused.

Karpenter:
Karpenter uses node affinity and labels defined in the provisioner configuration to match workload requirements to node capabilities. It will expand resources by dynamically launching EC2 instances based on current pod demands, adjusting for architecture, OS type, instance type, and capacity type as specified.

EKS upgrade considerations

Cluster Autoscaler:
During EKS upgrades, CA benefits from managed node groups that automatically update nodes as part of a rolling upgrade or force update process. No additional manual steps are required with CA.

Karpenter:
Upgrading with Karpenter requires manual steps: updating the EC2 launch template and ASG with the latest Amazon Machine Image (AMI) compatible with the new EKS version. Existing nodes may need to be manually drained and removed to ensure only upgraded instances are deployed.

Node types: on-demand vs. spot

Cluster Autoscaler: Operates only on predefined node types within managed node groups. CA does not allow dynamic selection between on-demand and spot instances; these must be preconfigured in the node group.
Karpenter: Karpenter’s provisioner allows for flexible node selection, including switching between on-demand and spot instances based on availability and cost optimization needs. This capability is particularly valuable in cost-sensitive environments where spot instance usage can yield substantial savings.

When to use cluster autoscaler vs. Karpenter

Cluster autoscaler:
Ideal for predictable, stable environments with predefined node groups:

Predictable Node Groups: For relatively fixed scaling needs within predefined groups, CA offers a straightforward, dependable solution.
Stable Workloads: CA is suitable for workloads that do not need rapid scaling or flexibility, such as batch processing tasks with set scaling parameters.
Mature Production Environments: As a well-established, Kubernetes-integrated tool, CA is supported by cloud providers and offers reliable stability for standard workloads.

Karpenter:
Best for dynamic, cost-sensitive environments requiring flexibility:

Cost Optimization: Karpenter enables dynamic instance selection, including spot instances, providing cost savings while adapting to pricing and demand fluctuations.
Multi-Zone or Multi-Region Support: For workloads that span multiple zones or regions, Karpenter can provision resources across these areas, enhancing availability and performance.
Diverse Resource Requirements: For complex workloads requiring varied instance types, such as AI/ML tasks with GPU needs or memory-intensive applications, Karpenter’s provisioner allows for real-time, workload-specific scaling.

Case Study: handling node group updates with Karpenter

For one of our clients, we solved the challenge of managing node group updates during EKS upgrades by applying Karpenter to streamline the process. This approach, focused on environments utilizing EC2 Launch Templates within EKS node groups, addressed critical upgrade challenges, minimized operational overhead, and ensured uninterrupted service.

Purpose:
The goal was to transition workloads to an EKS node group that uses an EC2 Launch Template, allowing for a smoother, automated upgrade process.

Key advantages of this solution:

Seamless EKS Upgrades: During EKS version upgrades, AWS automatically applies updates to the node template, incorporating the latest AMI ID without manual intervention. This enables Karpenter to deploy new nodes with the updated template, reducing administrative effort.
Automated Node Upgrades: In scenarios with EKS cluster auto-upgrades, both master and worker nodes are automatically upgraded, removing the need to manually update AMIs in Karpenter’s configuration and minimizing operational maintenance.
Zero Downtime for Services: This approach ensures that during upgrades, Karpenter transitions workloads to the updated nodes without service disruptions, maintaining high availability throughout the process.
Flexible Node Launching with Karpenter: By launching instances outside of predefined node groups yet within the EKS cluster, Karpenter allows for a flexible and optimized scaling strategy. This independence from managed node groups enables tailored resource allocation while ensuring the nodes remain part of the EKS cluster, enhancing both performance and scalability.

By integrating Karpenter with EC2 Launch Templates, this solution enabled automated, uninterrupted upgrades and efficient scaling, demonstrating how Kubernetes clusters can be managed with both flexibility and operational efficiency.

Cluster autoscaler

Purpose:
The Cluster Autoscaler automatically adjusts the node count in your Kubernetes cluster, scaling up when workloads need additional resources and scaling down when demand decreases.

Key point:
CA works with predefined node groups (collections of virtual machines) and scales according to pod demand, which suits more predictable workload environments.

Sample configuration (Priority expander):
A priority expander configuration helps CA determine which node groups to scale based on assigned priorities:

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-autoscaler-priority-expander
  namespace: kube-system
data:
  priorities: |
    10:
      - "node-group-small"      # Highest priority
    20:
      - "node-group-medium"     # Second priority
    50:
      - "node-group-large"      # Lower priority
    100:
      - "*"                     # Catch-all, lowest priority for other node groups

This example sets priority levels for each node group, allowing CA to scale according to preset rules and resource availability.

Karpenter

Sample Configuration (Provisioner):
The following provisioner configuration allows Karpenter to scale flexibly:

apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
  name: configuration
spec:
  consolidation:
    enabled: true
  limits:
    resources:
      cpu: '20'
      memory: 32Gi
  provider:
    launchTemplate: <Your EC2 Launch Template Name>
    tags:
      Name: karpenter.sh/provisioner/configuration
      karpenter.sh/provisioner-name: configuration
      nodegroup: configuration
  requirements:
    - key: karpenter.sh/capacity-type
      operator: In
      values:
        - on-demand
        - spot
    - key: node.kubernetes.io/instance-type
      operator: In
      values:
        - <Instance Type>
    - key: kubernetes.io/arch
      operator: In
      values:
        - amd64
    - key: kubernetes.io/os
      operator: In
      values

With this setup, Karpenter can decide on instance types and resource allocations based on workload requirements, offering the ability to switch between on-demand and spot instances as necessary.

Comparison: how Karpenter and Cluster Autoscaler work

Cluster Autoscaler (CA):

Mechanism: CA directly works with EKS-managed node groups, scaling according to predefined groups set by a configuration map. It relies on Kubernetes Scheduler signals and adjusts nodes based on the resources available for pods, adding nodes if necessary or removing underutilized ones.
Scaling Methodology: The priority expander configuration in CA controls scaling decisions based on node group priorities.

Karpenter:

Mechanism: Karpenter doesn’t depend on EKS-managed node groups. Instead, it uses the EC2 fleet API to create instances as needed. A provisioner in Karpenter defines instance requirements, which allows Karpenter to select the appropriate instance types on demand.
Scaling Methodology: Karpenter dynamically provisions instances based on labels and node affinity, accommodating specific pod requirements by calling the EC2 fleet API for more fine-grained control over instance selection.

How each tool handles scaling in and out

EKS upgrade considerations

Node types: on-demand vs. spot

Cluster Autoscaler: Operates only on predefined node types within managed node groups. CA does not allow dynamic selection between on-demand and spot instances; these must be preconfigured in the node group.
Karpenter: Karpenter’s provisioner allows for flexible node selection, including switching between on-demand and spot instances based on availability and cost optimization needs. This capability is particularly valuable in cost-sensitive environments where spot instance usage can yield substantial savings.

When to use cluster autoscaler vs. Karpenter

Cluster autoscaler:
Ideal for predictable, stable environments with predefined node groups:

Predictable Node Groups: For relatively fixed scaling needs within predefined groups, CA offers a straightforward, dependable solution.
Stable Workloads: CA is suitable for workloads that do not need rapid scaling or flexibility, such as batch processing tasks with set scaling parameters.
Mature Production Environments: As a well-established, Kubernetes-integrated tool, CA is supported by cloud providers and offers reliable stability for standard workloads.

Karpenter:
Best for dynamic, cost-sensitive environments requiring flexibility:

Cost Optimization: Karpenter enables dynamic instance selection, including spot instances, providing cost savings while adapting to pricing and demand fluctuations.
Multi-Zone or Multi-Region Support: For workloads that span multiple zones or regions, Karpenter can provision resources across these areas, enhancing availability and performance.
Diverse Resource Requirements: For complex workloads requiring varied instance types, such as AI/ML tasks with GPU needs or memory-intensive applications, Karpenter’s provisioner allows for real-time, workload-specific scaling.

Case Study: handling node group updates with Karpenter

Purpose:
The goal was to transition workloads to an EKS node group that uses an EC2 Launch Template, allowing for a smoother, automated upgrade process.

Key advantages of this solution:

Seamless EKS Upgrades: During EKS version upgrades, AWS automatically applies updates to the node template, incorporating the latest AMI ID without manual intervention. This enables Karpenter to deploy new nodes with the updated template, reducing administrative effort.
Automated Node Upgrades: In scenarios with EKS cluster auto-upgrades, both master and worker nodes are automatically upgraded, removing the need to manually update AMIs in Karpenter’s configuration and minimizing operational maintenance.
Zero Downtime for Services: This approach ensures that during upgrades, Karpenter transitions workloads to the updated nodes without service disruptions, maintaining high availability throughout the process.
Flexible Node Launching with Karpenter: By launching instances outside of predefined node groups yet within the EKS cluster, Karpenter allows for a flexible and optimized scaling strategy. This independence from managed node groups enables tailored resource allocation while ensuring the nodes remain part of the EKS cluster, enhancing both performance and scalability.

Cluster autoscaler

Purpose:
The Cluster Autoscaler automatically adjusts the node count in your Kubernetes cluster, scaling up when workloads need additional resources and scaling down when demand decreases.

Key point:
CA works with predefined node groups (collections of virtual machines) and scales according to pod demand, which suits more predictable workload environments.

Sample configuration (Priority expander):
A priority expander configuration helps CA determine which node groups to scale based on assigned priorities:

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-autoscaler-priority-expander
  namespace: kube-system
data:
  priorities: |
    10:
      - "node-group-small"      # Highest priority
    20:
      - "node-group-medium"     # Second priority
    50:
      - "node-group-large"      # Lower priority
    100:
      - "*"                     # Catch-all, lowest priority for other node groups

This example sets priority levels for each node group, allowing CA to scale according to preset rules and resource availability.

Karpenter

Sample Configuration (Provisioner):
The following provisioner configuration allows Karpenter to scale flexibly:

apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
  name: configuration
spec:
  consolidation:
    enabled: true
  limits:
    resources:
      cpu: '20'
      memory: 32Gi
  provider:
    launchTemplate: <Your EC2 Launch Template Name>
    tags:
      Name: karpenter.sh/provisioner/configuration
      karpenter.sh/provisioner-name: configuration
      nodegroup: configuration
  requirements:
    - key: karpenter.sh/capacity-type
      operator: In
      values:
        - on-demand
        - spot
    - key: node.kubernetes.io/instance-type
      operator: In
      values:
        - <Instance Type>
    - key: kubernetes.io/arch
      operator: In
      values:
        - amd64
    - key: kubernetes.io/os
      operator: In
      values

With this setup, Karpenter can decide on instance types and resource allocations based on workload requirements, offering the ability to switch between on-demand and spot instances as necessary.

Comparison: how Karpenter and Cluster Autoscaler work

Cluster Autoscaler (CA):

Mechanism: CA directly works with EKS-managed node groups, scaling according to predefined groups set by a configuration map. It relies on Kubernetes Scheduler signals and adjusts nodes based on the resources available for pods, adding nodes if necessary or removing underutilized ones.
Scaling Methodology: The priority expander configuration in CA controls scaling decisions based on node group priorities.

Karpenter:

Mechanism: Karpenter doesn’t depend on EKS-managed node groups. Instead, it uses the EC2 fleet API to create instances as needed. A provisioner in Karpenter defines instance requirements, which allows Karpenter to select the appropriate instance types on demand.
Scaling Methodology: Karpenter dynamically provisions instances based on labels and node affinity, accommodating specific pod requirements by calling the EC2 fleet API for more fine-grained control over instance selection.

How each tool handles scaling in and out

EKS upgrade considerations

Node types: on-demand vs. spot

Cluster Autoscaler: Operates only on predefined node types within managed node groups. CA does not allow dynamic selection between on-demand and spot instances; these must be preconfigured in the node group.
Karpenter: Karpenter’s provisioner allows for flexible node selection, including switching between on-demand and spot instances based on availability and cost optimization needs. This capability is particularly valuable in cost-sensitive environments where spot instance usage can yield substantial savings.

When to use cluster autoscaler vs. Karpenter

Cluster autoscaler:
Ideal for predictable, stable environments with predefined node groups:

Predictable Node Groups: For relatively fixed scaling needs within predefined groups, CA offers a straightforward, dependable solution.
Stable Workloads: CA is suitable for workloads that do not need rapid scaling or flexibility, such as batch processing tasks with set scaling parameters.
Mature Production Environments: As a well-established, Kubernetes-integrated tool, CA is supported by cloud providers and offers reliable stability for standard workloads.

Karpenter:
Best for dynamic, cost-sensitive environments requiring flexibility:

Cost Optimization: Karpenter enables dynamic instance selection, including spot instances, providing cost savings while adapting to pricing and demand fluctuations.
Multi-Zone or Multi-Region Support: For workloads that span multiple zones or regions, Karpenter can provision resources across these areas, enhancing availability and performance.
Diverse Resource Requirements: For complex workloads requiring varied instance types, such as AI/ML tasks with GPU needs or memory-intensive applications, Karpenter’s provisioner allows for real-time, workload-specific scaling.

Case Study: handling node group updates with Karpenter

Purpose:
The goal was to transition workloads to an EKS node group that uses an EC2 Launch Template, allowing for a smoother, automated upgrade process.

Key advantages of this solution:

Seamless EKS Upgrades: During EKS version upgrades, AWS automatically applies updates to the node template, incorporating the latest AMI ID without manual intervention. This enables Karpenter to deploy new nodes with the updated template, reducing administrative effort.
Automated Node Upgrades: In scenarios with EKS cluster auto-upgrades, both master and worker nodes are automatically upgraded, removing the need to manually update AMIs in Karpenter’s configuration and minimizing operational maintenance.
Zero Downtime for Services: This approach ensures that during upgrades, Karpenter transitions workloads to the updated nodes without service disruptions, maintaining high availability throughout the process.
Flexible Node Launching with Karpenter: By launching instances outside of predefined node groups yet within the EKS cluster, Karpenter allows for a flexible and optimized scaling strategy. This independence from managed node groups enables tailored resource allocation while ensuring the nodes remain part of the EKS cluster, enhancing both performance and scalability.

Cluster autoscaler

Purpose:
The Cluster Autoscaler automatically adjusts the node count in your Kubernetes cluster, scaling up when workloads need additional resources and scaling down when demand decreases.

Key point:
CA works with predefined node groups (collections of virtual machines) and scales according to pod demand, which suits more predictable workload environments.

Sample configuration (Priority expander):
A priority expander configuration helps CA determine which node groups to scale based on assigned priorities:

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-autoscaler-priority-expander
  namespace: kube-system
data:
  priorities: |
    10:
      - "node-group-small"      # Highest priority
    20:
      - "node-group-medium"     # Second priority
    50:
      - "node-group-large"      # Lower priority
    100:
      - "*"                     # Catch-all, lowest priority for other node groups

This example sets priority levels for each node group, allowing CA to scale according to preset rules and resource availability.

Karpenter

Sample Configuration (Provisioner):
The following provisioner configuration allows Karpenter to scale flexibly:

apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
  name: configuration
spec:
  consolidation:
    enabled: true
  limits:
    resources:
      cpu: '20'
      memory: 32Gi
  provider:
    launchTemplate: <Your EC2 Launch Template Name>
    tags:
      Name: karpenter.sh/provisioner/configuration
      karpenter.sh/provisioner-name: configuration
      nodegroup: configuration
  requirements:
    - key: karpenter.sh/capacity-type
      operator: In
      values:
        - on-demand
        - spot
    - key: node.kubernetes.io/instance-type
      operator: In
      values:
        - <Instance Type>
    - key: kubernetes.io/arch
      operator: In
      values:
        - amd64
    - key: kubernetes.io/os
      operator: In
      values

With this setup, Karpenter can decide on instance types and resource allocations based on workload requirements, offering the ability to switch between on-demand and spot instances as necessary.

Comparison: how Karpenter and Cluster Autoscaler work

Cluster Autoscaler (CA):

Mechanism: CA directly works with EKS-managed node groups, scaling according to predefined groups set by a configuration map. It relies on Kubernetes Scheduler signals and adjusts nodes based on the resources available for pods, adding nodes if necessary or removing underutilized ones.
Scaling Methodology: The priority expander configuration in CA controls scaling decisions based on node group priorities.

Karpenter:

Mechanism: Karpenter doesn’t depend on EKS-managed node groups. Instead, it uses the EC2 fleet API to create instances as needed. A provisioner in Karpenter defines instance requirements, which allows Karpenter to select the appropriate instance types on demand.
Scaling Methodology: Karpenter dynamically provisions instances based on labels and node affinity, accommodating specific pod requirements by calling the EC2 fleet API for more fine-grained control over instance selection.

How each tool handles scaling in and out

EKS upgrade considerations

Node types: on-demand vs. spot

Cluster Autoscaler: Operates only on predefined node types within managed node groups. CA does not allow dynamic selection between on-demand and spot instances; these must be preconfigured in the node group.
Karpenter: Karpenter’s provisioner allows for flexible node selection, including switching between on-demand and spot instances based on availability and cost optimization needs. This capability is particularly valuable in cost-sensitive environments where spot instance usage can yield substantial savings.

When to use cluster autoscaler vs. Karpenter

Cluster autoscaler:
Ideal for predictable, stable environments with predefined node groups:

Predictable Node Groups: For relatively fixed scaling needs within predefined groups, CA offers a straightforward, dependable solution.
Stable Workloads: CA is suitable for workloads that do not need rapid scaling or flexibility, such as batch processing tasks with set scaling parameters.
Mature Production Environments: As a well-established, Kubernetes-integrated tool, CA is supported by cloud providers and offers reliable stability for standard workloads.

Karpenter:
Best for dynamic, cost-sensitive environments requiring flexibility:

Cost Optimization: Karpenter enables dynamic instance selection, including spot instances, providing cost savings while adapting to pricing and demand fluctuations.
Multi-Zone or Multi-Region Support: For workloads that span multiple zones or regions, Karpenter can provision resources across these areas, enhancing availability and performance.
Diverse Resource Requirements: For complex workloads requiring varied instance types, such as AI/ML tasks with GPU needs or memory-intensive applications, Karpenter’s provisioner allows for real-time, workload-specific scaling.

Case Study: handling node group updates with Karpenter

Purpose:
The goal was to transition workloads to an EKS node group that uses an EC2 Launch Template, allowing for a smoother, automated upgrade process.

Key advantages of this solution:

Seamless EKS Upgrades: During EKS version upgrades, AWS automatically applies updates to the node template, incorporating the latest AMI ID without manual intervention. This enables Karpenter to deploy new nodes with the updated template, reducing administrative effort.
Automated Node Upgrades: In scenarios with EKS cluster auto-upgrades, both master and worker nodes are automatically upgraded, removing the need to manually update AMIs in Karpenter’s configuration and minimizing operational maintenance.
Zero Downtime for Services: This approach ensures that during upgrades, Karpenter transitions workloads to the updated nodes without service disruptions, maintaining high availability throughout the process.
Flexible Node Launching with Karpenter: By launching instances outside of predefined node groups yet within the EKS cluster, Karpenter allows for a flexible and optimized scaling strategy. This independence from managed node groups enables tailored resource allocation while ensuring the nodes remain part of the EKS cluster, enhancing both performance and scalability.

Cluster autoscaler

Purpose:
The Cluster Autoscaler automatically adjusts the node count in your Kubernetes cluster, scaling up when workloads need additional resources and scaling down when demand decreases.

Key point:
CA works with predefined node groups (collections of virtual machines) and scales according to pod demand, which suits more predictable workload environments.

Sample configuration (Priority expander):
A priority expander configuration helps CA determine which node groups to scale based on assigned priorities:

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-autoscaler-priority-expander
  namespace: kube-system
data:
  priorities: |
    10:
      - "node-group-small"      # Highest priority
    20:
      - "node-group-medium"     # Second priority
    50:
      - "node-group-large"      # Lower priority
    100:
      - "*"                     # Catch-all, lowest priority for other node groups

This example sets priority levels for each node group, allowing CA to scale according to preset rules and resource availability.

Karpenter

Sample Configuration (Provisioner):
The following provisioner configuration allows Karpenter to scale flexibly:

apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
  name: configuration
spec:
  consolidation:
    enabled: true
  limits:
    resources:
      cpu: '20'
      memory: 32Gi
  provider:
    launchTemplate: <Your EC2 Launch Template Name>
    tags:
      Name: karpenter.sh/provisioner/configuration
      karpenter.sh/provisioner-name: configuration
      nodegroup: configuration
  requirements:
    - key: karpenter.sh/capacity-type
      operator: In
      values:
        - on-demand
        - spot
    - key: node.kubernetes.io/instance-type
      operator: In
      values:
        - <Instance Type>
    - key: kubernetes.io/arch
      operator: In
      values:
        - amd64
    - key: kubernetes.io/os
      operator: In
      values

With this setup, Karpenter can decide on instance types and resource allocations based on workload requirements, offering the ability to switch between on-demand and spot instances as necessary.

Comparison: how Karpenter and Cluster Autoscaler work

Cluster Autoscaler (CA):

Mechanism: CA directly works with EKS-managed node groups, scaling according to predefined groups set by a configuration map. It relies on Kubernetes Scheduler signals and adjusts nodes based on the resources available for pods, adding nodes if necessary or removing underutilized ones.
Scaling Methodology: The priority expander configuration in CA controls scaling decisions based on node group priorities.

Karpenter:

Mechanism: Karpenter doesn’t depend on EKS-managed node groups. Instead, it uses the EC2 fleet API to create instances as needed. A provisioner in Karpenter defines instance requirements, which allows Karpenter to select the appropriate instance types on demand.
Scaling Methodology: Karpenter dynamically provisions instances based on labels and node affinity, accommodating specific pod requirements by calling the EC2 fleet API for more fine-grained control over instance selection.

How each tool handles scaling in and out

EKS upgrade considerations

Node types: on-demand vs. spot

Cluster Autoscaler: Operates only on predefined node types within managed node groups. CA does not allow dynamic selection between on-demand and spot instances; these must be preconfigured in the node group.
Karpenter: Karpenter’s provisioner allows for flexible node selection, including switching between on-demand and spot instances based on availability and cost optimization needs. This capability is particularly valuable in cost-sensitive environments where spot instance usage can yield substantial savings.

When to use cluster autoscaler vs. Karpenter

Cluster autoscaler:
Ideal for predictable, stable environments with predefined node groups:

Predictable Node Groups: For relatively fixed scaling needs within predefined groups, CA offers a straightforward, dependable solution.
Stable Workloads: CA is suitable for workloads that do not need rapid scaling or flexibility, such as batch processing tasks with set scaling parameters.
Mature Production Environments: As a well-established, Kubernetes-integrated tool, CA is supported by cloud providers and offers reliable stability for standard workloads.

Karpenter:
Best for dynamic, cost-sensitive environments requiring flexibility:

Cost Optimization: Karpenter enables dynamic instance selection, including spot instances, providing cost savings while adapting to pricing and demand fluctuations.
Multi-Zone or Multi-Region Support: For workloads that span multiple zones or regions, Karpenter can provision resources across these areas, enhancing availability and performance.
Diverse Resource Requirements: For complex workloads requiring varied instance types, such as AI/ML tasks with GPU needs or memory-intensive applications, Karpenter’s provisioner allows for real-time, workload-specific scaling.

Case Study: handling node group updates with Karpenter

Purpose:
The goal was to transition workloads to an EKS node group that uses an EC2 Launch Template, allowing for a smoother, automated upgrade process.

Key advantages of this solution:

Seamless EKS Upgrades: During EKS version upgrades, AWS automatically applies updates to the node template, incorporating the latest AMI ID without manual intervention. This enables Karpenter to deploy new nodes with the updated template, reducing administrative effort.
Automated Node Upgrades: In scenarios with EKS cluster auto-upgrades, both master and worker nodes are automatically upgraded, removing the need to manually update AMIs in Karpenter’s configuration and minimizing operational maintenance.
Zero Downtime for Services: This approach ensures that during upgrades, Karpenter transitions workloads to the updated nodes without service disruptions, maintaining high availability throughout the process.
Flexible Node Launching with Karpenter: By launching instances outside of predefined node groups yet within the EKS cluster, Karpenter allows for a flexible and optimized scaling strategy. This independence from managed node groups enables tailored resource allocation while ensuring the nodes remain part of the EKS cluster, enhancing both performance and scalability.

Jump to section

July 15, 2025 | 7 min read

How we solved a critical site-to-site VPN IP address conflict in AWS

Rishiraj Rathore

SRE @One2N

Spandan Ghosh

Content @One2N

How One2N solved a critical site-to-site VPN IP address conflict in AWS, including details on the problem, failed solutions, and the final parallel system approach that worked with automation.

July 15, 2025 | 7 min read

How we solved a critical site-to-site VPN IP address conflict in AWS

Rishiraj Rathore

SRE @One2N

Spandan Ghosh

Content @One2N

How One2N solved a critical site-to-site VPN IP address conflict in AWS, including details on the problem, failed solutions, and the final parallel system approach that worked with automation.

June 25, 2025 | 3 min read

DARE to question your alerts?

Saurabh Hirani

Principal SRE @One2N

Alert fatigue is real and it's hurting your on-call teams. This post breaks down how to make alerting systems smarter through regular alert analysis. Learn how to turn noisy alerts into actionable insights that drive real resilience.

June 25, 2025 | 3 min read

DARE to question your alerts?

Saurabh Hirani

Principal SRE @One2N

Alert fatigue is real and it's hurting your on-call teams. This post breaks down how to make alerting systems smarter through regular alert analysis. Learn how to turn noisy alerts into actionable insights that drive real resilience.

June 18, 2025 | 5 min read

Implementing secure error handling in Go for B2B SaaS applications

Mohit Kumar

Software Engineer

A centralized error handling library helped our team improve user experience and strengthen security in Go-based microservices by replacing leaky, inconsistent error messages with structured, user-friendly responses.

June 18, 2025 | 5 min read

Implementing secure error handling in Go for B2B SaaS applications

Mohit Kumar

Software Engineer

A centralized error handling library helped our team improve user experience and strengthen security in Go-based microservices by replacing leaky, inconsistent error messages with structured, user-friendly responses.

June 4, 2025 | 5 min read

Deploying a scalable NATS cluster part 1: core architecture and considerations

Barun Debnath

SRE @One2N

In this blog, we provide a detailed overview of the NATS architecture, key cluster design considerations, and best practices for deploying a scalable and reliable NATS messaging system. We cover topics like high availability, fault tolerance, message durability, and the infrastructure requirements.

June 4, 2025 | 5 min read

Deploying a scalable NATS cluster part 1: core architecture and considerations

Barun Debnath

SRE @One2N

In this blog, we provide a detailed overview of the NATS architecture, key cluster design considerations, and best practices for deploying a scalable and reliable NATS messaging system. We cover topics like high availability, fault tolerance, message durability, and the infrastructure requirements.

May 14, 2025 | 8 min read

Optimizing MongoDB backup strategy: lessons from achieving a 1-Hour RPO

Mihir Bhagwat

SRE @One2N

Sanket Rajgiri

SRE @One2N

This post walks through how we implemented a disaster recovery solution for a MongoDB cluster running on Google Kubernetes Engine (GKE) using the MongoDB Community Operator

May 14, 2025 | 8 min read

Optimizing MongoDB backup strategy: lessons from achieving a 1-Hour RPO

Mihir Bhagwat

SRE @One2N

Sanket Rajgiri

SRE @One2N

This post walks through how we implemented a disaster recovery solution for a MongoDB cluster running on Google Kubernetes Engine (GKE) using the MongoDB Community Operator

April 29, 2025 | 9 min read

Transforming alerting with GitOps - a journey in automating Elasticsearch alerts

Srivatsa RV

SRE @One2N

This blog tells you how to approach alerting from first principles while on the ELK Stack. We cover how to capture the right signals that a NOC team looks at, structure them as alert definitions and operationalize them relying on GitOps. This ensures teams can act on alerts confidently.

April 29, 2025 | 9 min read

Transforming alerting with GitOps - a journey in automating Elasticsearch alerts

Srivatsa RV

SRE @One2N

This blog tells you how to approach alerting from first principles while on the ELK Stack. We cover how to capture the right signals that a NOC team looks at, structure them as alert definitions and operationalize them relying on GitOps. This ensures teams can act on alerts confidently.

July 15, 2025 | 7 min read

How we solved a critical site-to-site VPN IP address conflict in AWS

Rishiraj Rathore

SRE @One2N

Spandan Ghosh

Content @One2N

How One2N solved a critical site-to-site VPN IP address conflict in AWS, including details on the problem, failed solutions, and the final parallel system approach that worked with automation.

June 25, 2025 | 3 min read

DARE to question your alerts?

Saurabh Hirani

Principal SRE @One2N

Alert fatigue is real and it's hurting your on-call teams. This post breaks down how to make alerting systems smarter through regular alert analysis. Learn how to turn noisy alerts into actionable insights that drive real resilience.

June 18, 2025 | 5 min read

Implementing secure error handling in Go for B2B SaaS applications

Mohit Kumar

Software Engineer

A centralized error handling library helped our team improve user experience and strengthen security in Go-based microservices by replacing leaky, inconsistent error messages with structured, user-friendly responses.

June 4, 2025 | 5 min read

Deploying a scalable NATS cluster part 1: core architecture and considerations

Barun Debnath

SRE @One2N

In this blog, we provide a detailed overview of the NATS architecture, key cluster design considerations, and best practices for deploying a scalable and reliable NATS messaging system. We cover topics like high availability, fault tolerance, message durability, and the infrastructure requirements.

July 15, 2025 | 7 min read

How we solved a critical site-to-site VPN IP address conflict in AWS

Rishiraj Rathore

SRE @One2N

Spandan Ghosh

Content @One2N

How One2N solved a critical site-to-site VPN IP address conflict in AWS, including details on the problem, failed solutions, and the final parallel system approach that worked with automation.

June 25, 2025 | 3 min read

DARE to question your alerts?

Saurabh Hirani

Principal SRE @One2N

Alert fatigue is real and it's hurting your on-call teams. This post breaks down how to make alerting systems smarter through regular alert analysis. Learn how to turn noisy alerts into actionable insights that drive real resilience.

June 18, 2025 | 5 min read

Implementing secure error handling in Go for B2B SaaS applications

Mohit Kumar

Software Engineer

A centralized error handling library helped our team improve user experience and strengthen security in Go-based microservices by replacing leaky, inconsistent error messages with structured, user-friendly responses.

June 4, 2025 | 5 min read

Deploying a scalable NATS cluster part 1: core architecture and considerations

Barun Debnath

SRE @One2N

In this blog, we provide a detailed overview of the NATS architecture, key cluster design considerations, and best practices for deploying a scalable and reliable NATS messaging system. We cover topics like high availability, fault tolerance, message durability, and the infrastructure requirements.

Blog

Subscribe for more such content

Stay updated with the latest insights and best practices in software engineering and site reliability engineering by subscribing to our content.

Subscribe for more such content

Stay updated with the latest insights and best practices in software engineering and site reliability engineering by subscribing to our content.

Subscribe for more such content

Stay updated with the latest insights and best practices in software engineering and site reliability engineering by subscribing to our content.

Subscribe for more such content

Stay updated with the latest insights and best practices in software engineering and site reliability engineering by subscribing to our content.

Subscribe for more such content

Stay updated with the latest insights and best practices in software engineering and site reliability engineering by subscribing to our content.

Services

Resources

Company

Karpenter vs Cluster Autoscaler: a comprehensive guide to Kubernetes Node Scaling

Karpenter vs Cluster Autoscaler: a comprehensive guide to Kubernetes Node Scaling

Karpenter vs Cluster Autoscaler: a comprehensive guide to Kubernetes Node Scaling

Karpenter vs Cluster Autoscaler: a comprehensive guide to Kubernetes Node Scaling

Karpenter vs Cluster Autoscaler: a comprehensive guide to Kubernetes Node Scaling

Share

Jump to section

Related posts

How we solved a critical site-to-site VPN IP address conflict in AWS

How One2N solved a critical site-to-site VPN IP address conflict in AWS, including details on the problem, failed solutions, and the final parallel system approach that worked with automation.

How we solved a critical site-to-site VPN IP address conflict in AWS

How One2N solved a critical site-to-site VPN IP address conflict in AWS, including details on the problem, failed solutions, and the final parallel system approach that worked with automation.

DARE to question your alerts?

Alert fatigue is real and it's hurting your on-call teams. This post breaks down how to make alerting systems smarter through regular alert analysis. Learn how to turn noisy alerts into actionable insights that drive real resilience.

DARE to question your alerts?

Alert fatigue is real and it's hurting your on-call teams. This post breaks down how to make alerting systems smarter through regular alert analysis. Learn how to turn noisy alerts into actionable insights that drive real resilience.

Implementing secure error handling in Go for B2B SaaS applications

A centralized error handling library helped our team improve user experience and strengthen security in Go-based microservices by replacing leaky, inconsistent error messages with structured, user-friendly responses.

Implementing secure error handling in Go for B2B SaaS applications

A centralized error handling library helped our team improve user experience and strengthen security in Go-based microservices by replacing leaky, inconsistent error messages with structured, user-friendly responses.

Deploying a scalable NATS cluster part 1: core architecture and considerations

In this blog, we provide a detailed overview of the NATS architecture, key cluster design considerations, and best practices for deploying a scalable and reliable NATS messaging system. We cover topics like high availability, fault tolerance, message durability, and the infrastructure requirements.

Deploying a scalable NATS cluster part 1: core architecture and considerations

In this blog, we provide a detailed overview of the NATS architecture, key cluster design considerations, and best practices for deploying a scalable and reliable NATS messaging system. We cover topics like high availability, fault tolerance, message durability, and the infrastructure requirements.

Optimizing MongoDB backup strategy: lessons from achieving a 1-Hour RPO

This post walks through how we implemented a disaster recovery solution for a MongoDB cluster running on Google Kubernetes Engine (GKE) using the MongoDB Community Operator

Optimizing MongoDB backup strategy: lessons from achieving a 1-Hour RPO

This post walks through how we implemented a disaster recovery solution for a MongoDB cluster running on Google Kubernetes Engine (GKE) using the MongoDB Community Operator

Transforming alerting with GitOps - a journey in automating Elasticsearch alerts

This blog tells you how to approach alerting from first principles while on the ELK Stack. We cover how to capture the right signals that a NOC team looks at, structure them as alert definitions and operationalize them relying on GitOps. This ensures teams can act on alerts confidently.

Transforming alerting with GitOps - a journey in automating Elasticsearch alerts

This blog tells you how to approach alerting from first principles while on the ELK Stack. We cover how to capture the right signals that a NOC team looks at, structure them as alert definitions and operationalize them relying on GitOps. This ensures teams can act on alerts confidently.

How we solved a critical site-to-site VPN IP address conflict in AWS

How One2N solved a critical site-to-site VPN IP address conflict in AWS, including details on the problem, failed solutions, and the final parallel system approach that worked with automation.

DARE to question your alerts?

Alert fatigue is real and it's hurting your on-call teams. This post breaks down how to make alerting systems smarter through regular alert analysis. Learn how to turn noisy alerts into actionable insights that drive real resilience.

Implementing secure error handling in Go for B2B SaaS applications

A centralized error handling library helped our team improve user experience and strengthen security in Go-based microservices by replacing leaky, inconsistent error messages with structured, user-friendly responses.

Deploying a scalable NATS cluster part 1: core architecture and considerations

In this blog, we provide a detailed overview of the NATS architecture, key cluster design considerations, and best practices for deploying a scalable and reliable NATS messaging system. We cover topics like high availability, fault tolerance, message durability, and the infrastructure requirements.

How we solved a critical site-to-site VPN IP address conflict in AWS

How One2N solved a critical site-to-site VPN IP address conflict in AWS, including details on the problem, failed solutions, and the final parallel system approach that worked with automation.

DARE to question your alerts?

Alert fatigue is real and it's hurting your on-call teams. This post breaks down how to make alerting systems smarter through regular alert analysis. Learn how to turn noisy alerts into actionable insights that drive real resilience.

Implementing secure error handling in Go for B2B SaaS applications

A centralized error handling library helped our team improve user experience and strengthen security in Go-based microservices by replacing leaky, inconsistent error messages with structured, user-friendly responses.

Deploying a scalable NATS cluster part 1: core architecture and considerations

In this blog, we provide a detailed overview of the NATS architecture, key cluster design considerations, and best practices for deploying a scalable and reliable NATS messaging system. We cover topics like high availability, fault tolerance, message durability, and the infrastructure requirements.

Subscribe for more such content

Subscribe for more such content

Subscribe for more such content

Subscribe for more such content

Subscribe for more such content