Workload Health

Overview

The Workload Health page in Komodor provides deep insights into the status and stability of your applications and services. It focuses on issues that affect workloads directly, including failures and misconfigurations, helping you ensure smooth operation across all running workloads.

What You’ll Find:

Real-Time Issues: This section lists immediate availability issues affecting Services, failed Jobs/CronJobs, and workflows runs with issues. (Read more about Monitors in Komodor)
- Types of live Issues:
  1. Unhealthy services
  2. Failed jobs/CronJobs
  3. Problematic workflows with identified issues. (ArgoWF,Airflow,Kubeflow..)
- Actions: For each open issue, you can click through to view detailed information, investigate root causes, and acknowledge the issue for tracking purposes.
Reliability Risks: Komodor surfaces potential risks in workloads that could impact performance over time. (Read more about Reliability risks in Komodor)
- Type of risks:
  1. Degraded services - HPA Reached Max, Workload CPU throttling, Container restarts, Single point of failure
  2. Node Pressure - Under provisioned workload, High request to limit ratio

The page offers flexible filtering options, allowing you to refine results according to your needs. You can filter by:

Category: Reliability risks or real-time issues
Status: Open, closed, or acknowledged
Clusters
Namespaces
Resource Type
Service Name

Standards: The page suggests best practices for workloads, helping you optimize Kubernetes resource management. Misconfigurations such as missing resource limits, liveness probes, or low number of replicas are flagged here, enabling you to address them proactively.

Automatic Status Changes for Violations

Komodor allows violations to transition automatically between statuses for specific cases:

Acknowledged to Closed: This may happen if the violation is no longer relevant or if the associated resource has been deleted.

How Workload Health Helps You

Workload Health ensures your applications and services run smoothly by centralizing all workload-related issues and risks in one view. This makes it easier for you to:

Quickly resolve critical workload issues.
Maintain high availability across all services.
Proactively manage and optimize workload configurations.

Workload Health

Overview

What You’ll Find:

Automatic Status Changes for Violations

How Workload Health Helps You

Was this article helpful?

Comments