Workload Health

Overview

The Workload Health page in Komodor provides deep insights into the status and stability of your applications and services. It focuses on issues that affect workloads directly, including failures and misconfigurations, helping you ensure smooth operation across all running workloads.

workload health.png

What You’ll Find:

  1. Real-Time Issues: This section lists immediate availability issues affecting Services, failed Jobs/CronJobs, and workflows runs with issues. (Read more about Monitors in Komodor)
    • Types of live Issues
      1. Unhealthy services
      2. Failed jobs/CronJobs
      3. Problematic workflows with identified issues. (ArgoWF,Airflow,Kubeflow..)
    • Actions: For each open issue, you can click through to view detailed information, investigate root causes, and acknowledge the issue for tracking purposes.
      realtime issues workload.png
  2. Reliability Risks: Komodor surfaces potential risks in workloads that could impact performance over time. (Read more about Reliability risks in Komodor)
    • Type of risks:
      1. Degraded services - HPA Reached Max, Workload CPU throttling, Container restarts, Single point of failure 
      2. Node Pressure - Under provisioned workload, High request to limit ratio
         
reliability section.png

The page offers flexible filtering options, allowing you to refine results according to your needs. You can filter by:

  • Category: Reliability risks or real-time issues
  • Status: Open, closed, or acknowledged
  • Clusters
  • Namespaces
  • Resource Type
  • Service Name
filters.png
  1. Standards: The page suggests best practices for workloads, helping you optimize Kubernetes resource management. Misconfigurations such as missing resource limits, liveness probes, or low number of replicas are flagged here, enabling you to address them proactively.
    standards.png

Automatic Status Changes for Violations

Komodor allows violations to transition automatically between statuses for specific cases:

Acknowledged to Closed: This may happen if the violation is no longer relevant or if the associated resource has been deleted.

How Workload Health Helps You

Workload Health ensures your applications and services run smoothly by centralizing all workload-related issues and risks in one view. This makes it easier for you to:

  • Quickly resolve critical workload issues.
  • Maintain high availability across all services.
  • Proactively manage and optimize workload configurations.

Was this article helpful?

0 out of 0 found this helpful

Comments

0 comments

Article is closed for comments.