Skip to content

Fly Machines

The fly-machines plugin monitors Fly.io machines across multiple apps. It tracks machine state, health checks, and detects deployments by watching for instance ID and image digest changes. Unlike platforms with a dedicated deployments API, Fly.io deployments are inferred from machine-level changes — when a machine’s instance_id changes, a new deployment has occurred.

import { createAgent } from 'zupdev';
import { flyMachines } from 'zupdev/plugins/fly-machines';
const agent = await createAgent({
name: 'fly-agent',
plugins: [
flyMachines({
auth: { token: process.env.FLY_API_TOKEN! },
apps: [
{
name: 'my-app',
serviceName: 'My App',
},
],
}),
],
});

A Fly.io API token is required. Generate one with fly tokens create from the Fly CLI.

FieldTypeDefaultDescription
authFlyAuthConfigRequired. Authentication configuration.
auth.tokenstringRequired. Fly.io API token.
appsFlyAppConfig[]Required. Apps to monitor. At least one app must be configured.
pollIntervalMsnumber60000Polling interval in milliseconds.
maxMachinesPerAppnumber50Maximum machines to track per app.
apiBaseUrlstring'https://api.machines.dev'Fly Machines API base URL.

Each app describes a Fly.io application to monitor:

FieldTypeRequiredDescription
namestringYesFly.io app name (as shown in fly apps list).
serviceNamestringYesHuman-readable service name for SRE context (used in observations and findings).
regionsstring[]NoFilter machines by specific regions (e.g., ['ord', 'cdg']). If not set, all regions are included.
metadataRecord<string, string>NoFilter machines by metadata key-value pairs.

The observer polls the Fly Machines API for each configured app and produces two types of observations:

Deployment events (fly-machines/deployment): Emitted when machines are updated (detected via instance_id changes). Each deployment event includes:

  • App and service names
  • New image digest, repository, and tag
  • Machines and regions affected
  • Deployment status: completed (all machines started), in_progress, partial (mixed success/failure), or failed
  • Severity: info for completed, critical for failed, error for partial

Machine state (fly-machines/machine): One observation per machine with:

  • Machine ID, name, state, region, and instance ID
  • Image reference (digest, repository, tag)
  • Guest configuration (CPU, memory)
  • Health check results
  • Most recent machine event
  • Severity: warning if the machine is stopped, destroyed, or has failing health checks; info otherwise

API errors (fly-machines/error): Emitted when the Fly API call fails for an app, with warning severity.

Analyzes Fly.io observations to provide deployment and health context:

  • Reports deployment status (completed, in progress, partial, failed) with machine counts and affected regions
  • Groups machines by service and reports running/total counts per region
  • Identifies stopped or suspended machines
  • Reports machines with failing health checks
  • Sets contributingFactor when deployment failures or health check failures are detected
  • Confidence: 0.85

All endpoints require authentication by default.

Lists all configured Fly.io apps with machine status summary.

Response:

{
"apps": [
{
"name": "my-app",
"serviceName": "My App",
"configuredRegions": ["ord", "cdg"],
"activeRegions": ["ord", "cdg"],
"lastFetchTime": "2025-06-15T10:30:00.000Z",
"machineCount": 4,
"runningCount": 4,
"isConsistent": true,
"currentImageDigest": "sha256:abc123...",
"imageDigestCount": 1
}
]
}

The isConsistent field indicates whether all machines are running the same image digest. When false, a deployment may be in progress or partially failed.

Returns detailed machine information for a specific app.

Response:

{
"app": {
"name": "my-app",
"serviceName": "My App"
},
"machines": [
{
"id": "e784079b449483",
"name": "my-app-machine-1",
"state": "started",
"region": "ord",
"instanceId": "01HXYZ...",
"imageRef": {
"repository": "registry.fly.io/my-app",
"tag": "deployment-01HXYZ",
"digest": "sha256:abc123..."
},
"guest": {
"cpu_kind": "shared",
"cpus": 1,
"memory_mb": 256
},
"checks": {
"http": {
"name": "http",
"status": "passing",
"output": "HTTP 200",
"updated_at": "2025-06-15T10:29:00.000Z"
}
},
"recentEvents": [
{
"type": "start",
"status": "started",
"timestamp": "2025-06-15T10:00:00.000Z"
}
],
"createdAt": "2025-06-01T00:00:00.000Z",
"updatedAt": "2025-06-15T10:00:00.000Z"
}
],
"lastFetchTime": "2025-06-15T10:30:00.000Z"
}

The plugin detects deployments by comparing each machine’s instance_id against its previously known value. When an instance ID changes, the machine has been updated — typically as part of a deployment.

Machines with the same new image digest are grouped into a single deployment event. The deployment status is determined by the states of the affected machines:

StatusCondition
completedAll affected machines are in the started state
in_progressMachines are still transitioning (not all started, none failed)
partialSome machines started, some failed (destroyed or replacing)
failedAll affected machines failed
import { createAgent } from 'zupdev';
import { flyMachines } from 'zupdev/plugins/fly-machines';
const agent = await createAgent({
name: 'fly-monitor',
mode: 'continuous',
loopInterval: 30000,
api: {
port: 3000,
auth: {
apiKeys: [{ key: process.env.API_KEY!, name: 'admin' }],
},
},
plugins: [
flyMachines({
auth: { token: process.env.FLY_API_TOKEN! },
pollIntervalMs: 60000,
maxMachinesPerApp: 50,
apps: [
{
name: 'api-prod',
serviceName: 'API (Production)',
regions: ['ord', 'cdg', 'nrt'],
},
{
name: 'worker-prod',
serviceName: 'Background Worker',
metadata: { role: 'worker' },
},
{
name: 'api-staging',
serviceName: 'API (Staging)',
regions: ['ord'],
},
],
}),
],
});
const server = agent.startApi({ port: 3000 });
await agent.start();

Three Fly.io apps are monitored across multiple regions. The API production app is filtered to three specific regions; the worker app filters by metadata. Deployments are detected automatically when machines update their instance IDs. Machine health checks are tracked and surfaced as observations. The REST API on port 3000 provides on-demand machine and deployment status.