Skip to content

Kubernetes

The kubernetes plugin monitors Kubernetes clusters by querying pod, deployment, node, and event state via kubectl. It detects unhealthy pods (CrashLoopBackOff, OOMKilled, ImagePullBackOff), degraded deployments, and not-ready nodes. It provides actions for restarting deployments, scaling replicas, deleting pods, and retrieving logs.

import { createAgent } from 'zupdev';
import { kubernetes } from 'zupdev/plugins/kubernetes';
const agent = await createAgent({
name: 'k8s-agent',
plugins: [
kubernetes({
clusterName: 'production',
namespaces: ['default', 'app'],
pollIntervalMs: 60000,
readOnly: false,
}),
],
});

The plugin requires kubectl to be installed and available in the system PATH. It uses Bun.spawn to execute kubectl commands.

FieldTypeDefaultDescription
kubeconfigPathstringPath to a kubeconfig file. Sets the KUBECONFIG environment variable for kubectl commands. If not set, kubectl uses its default resolution.
contextstringkubectl context name. Passed as --context to all kubectl commands.
namespacesstring[]['default']Namespaces to monitor. Use ['*'] for all namespaces.
clusterNamestring'default'Human-readable cluster name used in observations and API responses.
pollIntervalMsnumber60000Polling interval in milliseconds for the cluster health observer.
readOnlybooleanWhen truthy (or undefined), mutating actions (restart, scale, delete) are disabled and return an error. Set to false to enable write actions.
labelSelectorstringKubernetes label selector applied to pod queries (e.g., 'app=myservice').
maxPodsPerNamespacenumberLimit the number of pods tracked per namespace to avoid large payloads.
maxEventsPerPollnumber50Maximum warning events to fetch per poll cycle.
excludeSystemNamespacesbooleanWhen using namespaces: ['*'], exclude system namespaces (kube-system, kube-public, kube-node-lease, istio-system, cert-manager). Enabled by default unless explicitly set to false.
timeoutMsnumber30000Timeout in milliseconds for individual kubectl commands.

The observer polls the cluster for pods, deployments, nodes, and warning events. It produces observations for:

  • Cluster health summary (kubernetes/cluster-health): Overall counts of pods, deployments, nodes, and warning events. Severity is warning if any issues are detected, info otherwise.
  • Unhealthy pods (kubernetes/unhealthy-pod): One observation per unhealthy pod. Detects CrashLoopBackOff, ImagePullBackOff, OOMKilled, Error states, and pods stuck in Pending for more than 5 minutes. Severity is error for CrashLoopBackOff and OOMKilled, warning for other issues.
  • Degraded deployments (kubernetes/degraded-deployment): One observation per deployment with unavailable or not-ready replicas. Severity is error if zero replicas are ready, warning otherwise.
  • Not-ready nodes (kubernetes/node-not-ready): One observation per node that is not in the Ready condition. Severity is error.
  • Connection errors (kubernetes/connection-error): Emitted when kubectl commands fail (not installed, auth failed, cluster unreachable).

Analyzes Kubernetes observations and produces findings about cluster state:

  • Summarizes pod, deployment, and node counts
  • Lists each unhealthy pod with its issue type and restart count
  • Lists each degraded deployment with its ready/total replica count
  • Lists not-ready nodes
  • Sets contributingFactor based on the most significant issue (connection errors, node health, CrashLoopBackOff)
  • Confidence is 0.9 normally, 0.5 when there are connection errors

Restarts a Kubernetes deployment using kubectl rollout restart.

  • Risk: medium
  • Autonomy: approval-required (minConfidence 0.8)
  • Parameters: namespace (string), deployment (string)
  • Requires readOnly: false

Scales a deployment to a specified number of replicas.

  • Risk: medium
  • Autonomy: approval-required (minConfidence 0.8)
  • Parameters: namespace (string), deployment (string), replicas (number)
  • Requires readOnly: false

Deletes a pod to force a restart (the controller will recreate it).

  • Risk: medium
  • Autonomy: approval-required (minConfidence 0.8)
  • Parameters: namespace (string), pod (string), gracePeriodSeconds (number, optional)
  • Requires readOnly: false
  • Returns success if the pod is already deleted

Retrieves logs from a pod. This is a read-only action and does not require readOnly: false.

  • Risk: low
  • Autonomy: auto (minConfidence 0.5)
  • Parameters: namespace (string), pod (string), container (string, optional), tailLines (number, optional, 1-1000), sinceSeconds (number, optional)
  • Truncates output to 50KB if logs are large

All endpoints require authentication (Bearer token) by default.

Returns the current cluster state summary from the most recent poll.

Response:

{
"clusterName": "production",
"lastPollTime": "2025-06-15T10:30:00.000Z",
"summary": {
"clusterName": "production",
"namespaces": ["default", "app"],
"pods": { "total": 42, "healthy": 40, "unhealthy": 2, "byPhase": { "Running": 40, "Pending": 2 } },
"deployments": { "total": 12, "healthy": 11, "degraded": 1 },
"nodes": { "total": 3, "ready": 3, "notReady": 0 },
"recentWarningEvents": 5
},
"unhealthyPods": 2,
"degradedDeployments": 1,
"notReadyNodes": 0
}

Lists all namespaces in the cluster.

Response:

{
"namespaces": [
{ "name": "default", "status": "Active" },
{ "name": "app", "status": "Active" }
]
}

Lists pods in a specific namespace with their status, containers, and restart counts.

GET /k8s/namespaces/:namespace/deployments

Section titled “GET /k8s/namespaces/:namespace/deployments”

Lists deployments in a specific namespace with replica counts.

GET /k8s/namespaces/:namespace/pods/:pod/logs

Section titled “GET /k8s/namespaces/:namespace/pods/:pod/logs”

Retrieves logs for a specific pod.

Query parameters:

  • container — container name (optional)
  • tail — number of lines to return (default: 100)

POST /k8s/namespaces/:namespace/deployments/:deployment/restart

Section titled “POST /k8s/namespaces/:namespace/deployments/:deployment/restart”

Triggers a rollout restart for a deployment. Requires readOnly: false.

POST /k8s/namespaces/:namespace/deployments/:deployment/scale

Section titled “POST /k8s/namespaces/:namespace/deployments/:deployment/scale”

Scales a deployment. Requires readOnly: false.

Request body:

{ "replicas": 3 }

DELETE /k8s/namespaces/:namespace/pods/:pod

Section titled “DELETE /k8s/namespaces/:namespace/pods/:pod”

Deletes a pod. Requires readOnly: false.

import { createAgent } from 'zupdev';
import { kubernetes } from 'zupdev/plugins/kubernetes';
const agent = await createAgent({
name: 'k8s-monitor',
mode: 'continuous',
loopInterval: 30000,
api: {
port: 3000,
auth: {
apiKeys: [{ key: process.env.API_KEY!, name: 'admin' }],
},
},
plugins: [
kubernetes({
clusterName: 'production',
kubeconfigPath: '/home/deploy/.kube/config',
context: 'prod-cluster',
namespaces: ['default', 'app', 'workers'],
pollIntervalMs: 60000,
readOnly: false,
excludeSystemNamespaces: true,
maxPodsPerNamespace: 100,
maxEventsPerPoll: 50,
timeoutMs: 30000,
}),
],
});
const server = agent.startApi({ port: 3000 });
await agent.start();

This monitors three namespaces every 60 seconds, detecting CrashLoopBackOff pods, degraded deployments, and not-ready nodes. Write actions (restart, scale, delete) are enabled but require approval since their autonomy mode is approval-required. The REST API on port 3000 allows on-demand queries and manual interventions. The excludeSystemNamespaces flag only has an effect when using namespaces: ['*'].