Node Autoscalers

Overview

Node autoscalers are tools used in Kubernetes environments to automatically scale the number of nodes in a cluster based on current workloads. Autoscalers help ensure that applications have the necessary compute resources - scaling up during traffic spikes and scaling down during idle periods to reduce costs.While powerful and commonly used, they can also be complex to configure and manage. 

Komodor’s Node Autoscaler Add-on provides visibility into the scaling behaviour of your Kubernetes clusters, helping you identify the impact of scale-down events, cost inefficiencies and configuration issues across all your clusters.

Supported Autoscalers:

  • Cluster Autoscaler  (detected by labels containing cluster-autoscaler)
  • Karpenter (detected by labels containing karpenter)

Key capabilities

Node Autoscalers Overview

  • Autoscaler type per cluster
  • Node pools and their configurations
  • CPU & memory: capacity vs. allocation vs. usage
  • Pending pods and scaling activity over time
  • Direct links to logs, events, and configuration details

Actionable Insights

Automatically detect reliability risks and inefficiencies:

  • Scale-down impact: Detects workload disruptions when nodes are scaled down.
  • Over-provisioned clusters’ capacity: Identifies cases where clusters’ allocation is much lower than the actual capacity, impacting cost and optimization.
  • Pending Pods: Surfaces pods in an unschedulable state for an extended period of time

Each insight includes a detailed impact analysis and suggested fixes, these insights are also powered by KlaudiaAI.

How it works:

Komodor scans your environment to:

  • Detect deployed autoscaler deployments
  • Analyze configuration parameters
  • Monitor node and pod activity over time
  • Correlate events like evictions, job failures, or availability issues with autoscaling events

Under Kubernetes Add-ons → Autoscalers in the left-side navigation bar, Komodor provides a unified dashboard to view all Node Autoscalers across your clusters. The dashboard includes:

  • A quick glance into all autoscalers in your environment, surfacing information such as CPU & memory allocation and capacity, number of pending pods and managed nodes, and whether Komodor detected any Risks.
  • Quick filters to narrow down the view based on the autoscaler you are interested in.

Autoscaler Details:

Clicking on an autoscaler opens a detailed view with the following information:

  • Komodor Risks - insights surfaced by Komodor to help you get visibility into your autoscaler impact, and improve its configuration as needed
  • Nodes Metrics
  • Autoscaler configuration parameters
  • Quick access to all service information 

Autoscaler Reliability Risks:

As mentioned above, Komodor surfaces a few reliability risks in case they are detected in the cluster.

Those risks can be found within the autoscaler view, as well as under Infrastructure Health -> Add-ons risks impact group.

Scale down impact:

Detects workload disruptions due to node scale-downs. This risk is triggered in case scale-downs were detected, alongside actual impact (availability issues). The number of availability issues triggering the risk is configurable in Organization Settings -> Health Policies -> Reliability Policies.

The Risk presents the sequence of events over time (scale downs, availability issues, affected jobs & workflows), and allows you to drill down into the unhealthy services to understand the impact. Komodor also raises options to improve and reduce the possible impact.

Over-provisioned clusters capacity:

Identifies cases where clusters’ resource allocation is much lower than the actual capacity, impacting cost and optimization. The risk is triggered when Komodor detects over-provisioning of more than 50% in CPU or memory (or both). The threshold is configurable in Organization Settings -> Health Policies -> Reliability Policies.

As part of this risk, Komodor presents the ‘%’ of over-provisioning (allocation vs capacity) for each node pool (karpenter) / instance type group (cluster autoscaler), and through that - surfaces the potential CPU & memory optimization, both in the capacity as well as in ‘$’.

With all of the above information, Komodor leverages KlaudiaAI to provide smart recommendations for a more optimized mixture & ratio of instances.

Pending Pods: 

Surfaces pods in an unschedulable state for an extended period. This risk is triggered when pods are found pending for more than 5 minutes. The threshold is configurable in Organization Settings -> Health Policies -> Reliability Policies.

This risk provides a histogram of Pods’ time-to-be-scheduled and allows you the ability to dive into the most impacted services to identify any misconfiguration causing such scheduling delays

 

Related to

Was this article helpful?

0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.