Cloud Run
The cloud-run plugin monitors Google Cloud Run services across multiple projects and regions. It tracks service health, revision rollouts, and can shift traffic for auto-rollback when new revisions fail. It optionally ingests Cloud Logging error logs and Cloud Monitoring metrics as additional observations.
Installation
Section titled “Installation”import { createAgent } from 'zupdev';import { cloudRun } from 'zupdev/plugins/cloud-run';
const agent = await createAgent({ name: 'cloud-run-agent', plugins: [ cloudRun({ projects: [ { projectId: 'my-project', regions: ['us-central1'], }, ], }), ],});Requirements
Section titled “Requirements”The plugin uses Google Application Default Credentials (ADC) by default. Ensure credentials are available in the environment (e.g., via GOOGLE_APPLICATION_CREDENTIALS or running on GCE/Cloud Shell).
The google-auth-library package must be installed for authentication.
Plugin options
Section titled “Plugin options”| Field | Type | Default | Description |
|---|---|---|---|
auth | CloudRunAuthConfig | { useADC: true } | Authentication configuration. Uses Application Default Credentials by default. |
auth.useADC | boolean | true | Whether to use Application Default Credentials. |
auth.scopes | string[] | ['https://www.googleapis.com/auth/cloud-platform'] | OAuth scopes to request when using ADC. |
projects | CloudRunProjectConfig[] | — | Required. Projects and regions to monitor. At least one project must be configured. |
pollIntervalMs | number | 60000 | Polling interval in milliseconds. |
readOnly | boolean | true | When true, traffic-shifting actions are disabled. Set to false to enable rollback actions. |
autoRollback | boolean | false | Enable automatic rollback when a new revision fails to become ready. |
autoRollbackMinReadyMinutes | number | 5 | Minimum minutes a new revision must be ready before a rollout is considered failed. |
maxRevisionsPerService | number | 20 | Maximum revisions to fetch per service. |
includeLogs | boolean | false | Include Cloud Logging error observations. |
logQueryWindowMinutes | number | 10 | Time window for log queries in minutes. |
logPageSize | number | 50 | Maximum log entries to fetch per service. |
includeMetrics | boolean | false | Include Cloud Monitoring metrics observations. |
metricsWindowMinutes | number | 5 | Time window for metrics queries in minutes. |
errorRateWarningThreshold | number | 0.05 | Error rate above which a warning observation is emitted. |
errorRateErrorThreshold | number | 0.1 | Error rate above which an error observation is emitted. |
Project configuration
Section titled “Project configuration”Each project describes a GCP project and its regions to monitor:
| Field | Type | Required | Description |
|---|---|---|---|
projectId | string | Yes | GCP project ID. |
regions | string[] | Yes | Regions to monitor (e.g., ['us-central1', 'europe-west1']). |
services | string[] | No | Allowlist of service names. If set, only these services are monitored. |
labels | Record<string, string> | No | Label filter. Service labels must include all provided key-value pairs. |
serviceNameMap | Record<string, string> | No | Mapping of Cloud Run service name to a human-readable display name. |
OODA phase contributions
Section titled “OODA phase contributions”Observe: service health
Section titled “Observe: service health”The observer polls Cloud Run APIs and produces observations for:
- Service status: Tracks each service’s conditions (Ready, ConfigurationsReady, RoutesReady), traffic allocation, and latest revision state.
- Rollout detection: Detects when a new revision is created (latestCreatedRevision differs from latestReadyRevision), tracks rollout age, and determines rollout status (in_progress, completed, failed).
- Log observations (when
includeLogsis enabled): Fetches recent error-severity log entries from Cloud Logging for each monitored service. - Metric observations (when
includeMetricsis enabled): Fetches request count and error rate metrics from Cloud Monitoring. Emits warning or error severity based on configured thresholds.
Orient: rollout analysis
Section titled “Orient: rollout analysis”Analyzes Cloud Run observations and produces findings about:
- Service health status and conditions
- Active rollout progress and duration
- Failed rollouts with error messages
- Error rate trends when metrics are enabled
Sets contributingFactor when rollout failures or elevated error rates are detected.
Decide: auto-rollback
Section titled “Decide: auto-rollback”When autoRollback is enabled and a rollout is detected as failed (the new revision has not become ready within autoRollbackMinReadyMinutes), the decision strategy proposes shifting traffic back to the last known good revision.
Act: traffic management
Section titled “Act: traffic management”- rollback: Shifts 100% of traffic to the last known good revision. Requires
readOnly: false. - set-traffic: Sets traffic allocation across revisions. Requires
readOnly: false. - deploy-revision: Deploys a new revision with a specified image. Requires
readOnly: false.
REST API endpoints
Section titled “REST API endpoints”All endpoints require authentication by default.
GET /cloud-run/services
Section titled “GET /cloud-run/services”Lists all monitored Cloud Run services with their current state.
Response:
{ "services": [ { "key": "my-project/us-central1/api-service", "projectId": "my-project", "region": "us-central1", "service": "api-service", "serviceName": "API Service", "url": "https://api-service-xyz.a.run.app", "latestReadyRevision": "api-service-00042-abc", "latestCreatedRevision": "api-service-00042-abc", "traffic": [{ "revision": "api-service-00042-abc", "percent": 100 }], "rolloutStatus": "completed", "updatedAt": "2025-06-15T10:30:00.000Z" } ]}GET /cloud-run/services/:name
Section titled “GET /cloud-run/services/:name”Returns detailed state for a specific service (matched by service name within the key).
POST /cloud-run/services/:name/rollback
Section titled “POST /cloud-run/services/:name/rollback”Triggers a rollback to the last known good revision by shifting 100% of traffic. Requires readOnly: false.
Response:
{ "success": true, "message": "Rolled back api-service to revision api-service-00041-def"}POST /cloud-run/services/:name/traffic
Section titled “POST /cloud-run/services/:name/traffic”Sets traffic allocation for a service. Requires readOnly: false.
Request body:
{ "traffic": [ { "revision": "api-service-00041-def", "percent": 90 }, { "revision": "api-service-00042-abc", "percent": 10 } ]}Full example
Section titled “Full example”import { createAgent } from 'zupdev';import { cloudRun } from 'zupdev/plugins/cloud-run';
const agent = await createAgent({ name: 'cloud-run-monitor', mode: 'continuous', loopInterval: 30000, api: { port: 3000, auth: { apiKeys: [{ key: process.env.API_KEY!, name: 'admin' }], }, }, plugins: [ cloudRun({ projects: [ { projectId: 'my-production-project', regions: ['us-central1', 'europe-west1'], services: ['api-service', 'worker-service'], serviceNameMap: { 'api-service': 'API Gateway', 'worker-service': 'Background Worker', }, }, { projectId: 'my-staging-project', regions: ['us-central1'], labels: { env: 'staging' }, }, ], pollIntervalMs: 60000, readOnly: false, autoRollback: true, autoRollbackMinReadyMinutes: 5, includeLogs: true, logQueryWindowMinutes: 10, includeMetrics: true, metricsWindowMinutes: 5, errorRateWarningThreshold: 0.05, errorRateErrorThreshold: 0.1, }), ],});
const server = agent.startApi({ port: 3000 });await agent.start();Two GCP projects are monitored across multiple regions. Production services are explicitly listed; staging services are discovered by label. Auto-rollback is enabled — if a new revision fails to become ready within 5 minutes, traffic shifts back to the last known good revision. Cloud Logging and Cloud Monitoring are both enabled for additional context. The REST API on port 3000 allows manual rollbacks and traffic adjustments.