Workflows 🔀

Overview

The Workflows feature in Komodor provides full visibility and monitoring of AI/ML workflows on Kubernetes. This feature allows you to track workflow runs and troubleshoot issues across various workflow engines. Workflows is designed to simplify the management of workflows from popular engines like Argo Workflows, Spark and Apache Airflow, with support for custom workflows by using specific labels.

With Workflows, you can:

  • View and monitor your workflows
  • Quickly identify issues with pods in past runs
  • Insight into related infrastructure events, such as node terminations or issues, that may impact the workflow.
wf with correlated node termination for docs.png

Accessing Workflows

  1. Navigate to Workflows:
    1. Access Workflows under the Kubernetes AddOns section in the left sidebar. This section provides access to monitoring and configuration tools specifically for Kubernetes-based workloads.

      Screenshot 2024-10-27 at 15.08.30.png

    2. View the Workflows Dashboard:
      • The Workflows Dashboard displays all monitored workflows, with separate tabs for Argo Workflows, Airflow, Spark and Custom Engines.
      • Each tab includes a table listing workflows, organized by DAG/Template, with details like the latest run status, duration, and any identified issues.
workflows docs update.png

Out-of-the-Box Monitoring for Argo Workflows, Airflow and Spark

Argo Workflows, Airflow and Spark Monitoring:

  1. For workflows in Argo, Airflow and Spark, Komodor automatically monitors and displays workflows without additional setup. Simply navigate to the relevant tab to view active and historical runs.
  2. The dashboard displays workflows organized by DAG/Template and shows the status (Running/Completed) and any issues for the latest run.Workflows List Airflow for docs.png
  3. Understanding Status Indicators:
    - Workflow statuses are calculated based on pod statuses within the workflow. Therfore, status and duration reflect pod-based data and may have a 10-minute delay in updates. Workflows will display Running or Completed status, with an indicator if issues were detected.latest run for docs.png

Custom Workflow Support

Custom workflows can also be monitored by Komodor by adding specific labels to your workflow’s pods.

  1. Adding Custom Workflow Labels:
    • To enable Komodor to identify and monitor custom workflows, label your pods with the following keys:
      • Workflow DAG ID: app.komodor.com/WorkflowDagId (e.g., data-processing-prod)
      • Workflow Engine: app.komodor.com/WorkflowEngine (e.g., MLFlow)
      • Workflow Run ID: app.komodor.com/WorkflowRunId (e.g., run-3235)
      • Workflow Task ID: app.komodor.com/WorkflowTaskId (e.g., validation-1121)
labels custom wf for docs.png
  1. Viewing Custom Workflows:
    1. Once labeled, custom workflows will appear under the Custom Engines tab in the Workflows Dashboard.
    2. Similar to Argo and Airflow, workflows are organized by DAG/Template, showing the latest run status and any issues identified.custom workflows for docs.png

Workflow Pod Monitoring

The Workflow Pod Monitor is automatically enabled for each cluster, operating on a fault-based monitoring approach that tracks Argo, Airflow, and labeled pods. This monitor provides real-time insights into workflow performance and highlights any issues in pod execution.

For more detailed information check out our Monitors guide

Here’s an example of workflow pod issue:
wf pod issue example for docs.png

Workflow View

The workflow view allows you to observe and troubleshoot your workflows easily.

The screen contains:

Runs Dropdown:

For each workflow template, you can switch between different runs using the Runs Dropdown. This allows you to review historical runs and compare performance across instances.wf dropdown for docs.png

Timeline and events views:

Each workflow run contains detailed information about pod phases, issues, and infrastructure events. This data allows you to identify potential bottlenecks, troubleshoot failures, and understand the health of each workflow.

  1. Tracking Issues with Pod Phases:
    For each workflow’s pods, Komodor tracks all pod phases and pod issues within the workflow’s timeline, showing each task as a 'swimlane' in the workflow view.
  2. Tracking correlated Infrastructure Events
    For each workflow, Komodor will identify correlated infrastructure events (e.g., node terminations, node issues) that are also displayed to give context to potential workflow disruptions.
wf with correlated node termination for docs.png

Timeline capabilities:

  1. Toggle between timeline and event list views.wf timeline vs list toggle for docs.png
  2. Show only pods with issues toggleshow only pod with issues for docs.png

Please note - Workflow data is retained for 3 days


 

Was this article helpful?

0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.