Skip to content

Cloud Run

The cloud-run plugin monitors Google Cloud Run services across multiple projects and regions. It tracks service health, revision rollouts, and can shift traffic for auto-rollback when new revisions fail. It optionally ingests Cloud Logging error logs and Cloud Monitoring metrics as additional observations.

import { createAgent } from 'zupdev';
import { cloudRun } from 'zupdev/plugins/cloud-run';
const agent = await createAgent({
name: 'cloud-run-agent',
plugins: [
cloudRun({
projects: [
{
projectId: 'my-project',
regions: ['us-central1'],
},
],
}),
],
});

The plugin uses Google Application Default Credentials (ADC) by default. Ensure credentials are available in the environment (e.g., via GOOGLE_APPLICATION_CREDENTIALS or running on GCE/Cloud Shell).

The google-auth-library package must be installed for authentication.

FieldTypeDefaultDescription
authCloudRunAuthConfig{ useADC: true }Authentication configuration. Uses Application Default Credentials by default.
auth.useADCbooleantrueWhether to use Application Default Credentials.
auth.scopesstring[]['https://www.googleapis.com/auth/cloud-platform']OAuth scopes to request when using ADC.
projectsCloudRunProjectConfig[]Required. Projects and regions to monitor. At least one project must be configured.
pollIntervalMsnumber60000Polling interval in milliseconds.
readOnlybooleantrueWhen true, traffic-shifting actions are disabled. Set to false to enable rollback actions.
autoRollbackbooleanfalseEnable automatic rollback when a new revision fails to become ready.
autoRollbackMinReadyMinutesnumber5Minimum minutes a new revision must be ready before a rollout is considered failed.
maxRevisionsPerServicenumber20Maximum revisions to fetch per service.
includeLogsbooleanfalseInclude Cloud Logging error observations.
logQueryWindowMinutesnumber10Time window for log queries in minutes.
logPageSizenumber50Maximum log entries to fetch per service.
includeMetricsbooleanfalseInclude Cloud Monitoring metrics observations.
metricsWindowMinutesnumber5Time window for metrics queries in minutes.
errorRateWarningThresholdnumber0.05Error rate above which a warning observation is emitted.
errorRateErrorThresholdnumber0.1Error rate above which an error observation is emitted.

Each project describes a GCP project and its regions to monitor:

FieldTypeRequiredDescription
projectIdstringYesGCP project ID.
regionsstring[]YesRegions to monitor (e.g., ['us-central1', 'europe-west1']).
servicesstring[]NoAllowlist of service names. If set, only these services are monitored.
labelsRecord<string, string>NoLabel filter. Service labels must include all provided key-value pairs.
serviceNameMapRecord<string, string>NoMapping of Cloud Run service name to a human-readable display name.

The observer polls Cloud Run APIs and produces observations for:

  • Service status: Tracks each service’s conditions (Ready, ConfigurationsReady, RoutesReady), traffic allocation, and latest revision state.
  • Rollout detection: Detects when a new revision is created (latestCreatedRevision differs from latestReadyRevision), tracks rollout age, and determines rollout status (in_progress, completed, failed).
  • Log observations (when includeLogs is enabled): Fetches recent error-severity log entries from Cloud Logging for each monitored service.
  • Metric observations (when includeMetrics is enabled): Fetches request count and error rate metrics from Cloud Monitoring. Emits warning or error severity based on configured thresholds.

Analyzes Cloud Run observations and produces findings about:

  • Service health status and conditions
  • Active rollout progress and duration
  • Failed rollouts with error messages
  • Error rate trends when metrics are enabled

Sets contributingFactor when rollout failures or elevated error rates are detected.

When autoRollback is enabled and a rollout is detected as failed (the new revision has not become ready within autoRollbackMinReadyMinutes), the decision strategy proposes shifting traffic back to the last known good revision.

  • rollback: Shifts 100% of traffic to the last known good revision. Requires readOnly: false.
  • set-traffic: Sets traffic allocation across revisions. Requires readOnly: false.
  • deploy-revision: Deploys a new revision with a specified image. Requires readOnly: false.

All endpoints require authentication by default.

Lists all monitored Cloud Run services with their current state.

Response:

{
"services": [
{
"key": "my-project/us-central1/api-service",
"projectId": "my-project",
"region": "us-central1",
"service": "api-service",
"serviceName": "API Service",
"url": "https://api-service-xyz.a.run.app",
"latestReadyRevision": "api-service-00042-abc",
"latestCreatedRevision": "api-service-00042-abc",
"traffic": [{ "revision": "api-service-00042-abc", "percent": 100 }],
"rolloutStatus": "completed",
"updatedAt": "2025-06-15T10:30:00.000Z"
}
]
}

Returns detailed state for a specific service (matched by service name within the key).

Triggers a rollback to the last known good revision by shifting 100% of traffic. Requires readOnly: false.

Response:

{
"success": true,
"message": "Rolled back api-service to revision api-service-00041-def"
}

Sets traffic allocation for a service. Requires readOnly: false.

Request body:

{
"traffic": [
{ "revision": "api-service-00041-def", "percent": 90 },
{ "revision": "api-service-00042-abc", "percent": 10 }
]
}
import { createAgent } from 'zupdev';
import { cloudRun } from 'zupdev/plugins/cloud-run';
const agent = await createAgent({
name: 'cloud-run-monitor',
mode: 'continuous',
loopInterval: 30000,
api: {
port: 3000,
auth: {
apiKeys: [{ key: process.env.API_KEY!, name: 'admin' }],
},
},
plugins: [
cloudRun({
projects: [
{
projectId: 'my-production-project',
regions: ['us-central1', 'europe-west1'],
services: ['api-service', 'worker-service'],
serviceNameMap: {
'api-service': 'API Gateway',
'worker-service': 'Background Worker',
},
},
{
projectId: 'my-staging-project',
regions: ['us-central1'],
labels: { env: 'staging' },
},
],
pollIntervalMs: 60000,
readOnly: false,
autoRollback: true,
autoRollbackMinReadyMinutes: 5,
includeLogs: true,
logQueryWindowMinutes: 10,
includeMetrics: true,
metricsWindowMinutes: 5,
errorRateWarningThreshold: 0.05,
errorRateErrorThreshold: 0.1,
}),
],
});
const server = agent.startApi({ port: 3000 });
await agent.start();

Two GCP projects are monitored across multiple regions. Production services are explicitly listed; staging services are discovered by label. Auto-rollback is enabled — if a new revision fails to become ready within 5 minutes, traffic shifts back to the last known good revision. Cloud Logging and Cloud Monitoring are both enabled for additional context. The REST API on port 3000 allows manual rollbacks and traffic adjustments.