Playbooks
Playbooks are markdown files that get injected into the LLM’s context during the Orient and Decide phases. They carry the stuff your team knows but the LLM doesn’t — runbooks, incident retro learnings, system quirks.
Plugins handle how to observe and act. Playbooks tell the LLM when and why.
Why playbooks?
Section titled “Why playbooks?”Zup’s plugins are deterministic TypeScript — great for reliable execution. But the LLM driving investigation and decisions doesn’t know that your /health endpoint returns 200 even when degraded, or that rollbacks should be avoided for deployments older than 2 hours.
Playbooks let anyone on the team write that knowledge down in plain markdown, no TypeScript required.
Playbook format
Section titled “Playbook format”A playbook is a .md file with YAML frontmatter:
---name: High Error Ratedescription: Handling sustained high error rates across servicestrigger: severity: warning keywords: [error rate, 5xx, status 500] sources: [http-monitor]phases: [orient, decide]priority: 10---
## Investigation Guidance
When error rates spike, follow this sequence:
1. Query error logs for the affected service -- focus on the last 15 minutes2. Check for recent deployments (< 30 min window)3. Compare error rates before and after any deployment4. If a deploy caused it, check the diff for bad config or missing env vars
## Decision Rules
- Error rate > 10% AND recent deploy: recommend rollback (high confidence)- Error rate > 10% AND no recent deploy: escalate to human- Error rate 5-10% AND recent deploy: canary analysis before rollback- NEVER auto-rollback if the deployment is > 2 hours old
## System-Specific Context
- API gateway logs are in CloudWatch under `/prod/api-gateway`- Deployments happen via ArgoCD -- check argo for recent syncs- The /health endpoint returns 200 even when degraded -- check the `status` field- Database connection pool exhaustion looks like 5xx but isn't -- check `pg_stat_activity`Frontmatter reference
Section titled “Frontmatter reference”| Field | Type | Default | Description |
|---|---|---|---|
name | string | — | Required. Human-readable playbook name. |
description | string | — | Required. Short description used for logging. |
id | string | Derived from filename | Unique identifier. Auto-generated from the filename if not set. |
trigger | object | — | Conditions that activate this playbook. No trigger = always active. |
trigger.severity | ObservationSeverity | — | Minimum observation severity to activate. |
trigger.keywords | string[] | — | Keywords matched against observation data (case-insensitive, any match). |
trigger.sources | string[] | — | Observation source prefixes to match (e.g., http-monitor). |
phases | ('orient' | 'decide')[] | ['orient', 'decide'] | Which OODA phases this playbook applies to. |
priority | number | 0 | Ordering when multiple playbooks match. Higher = injected first. |
When a trigger has multiple conditions (e.g., both severity and keywords), all must match (AND logic). To express OR logic, create separate playbooks.
Loading playbooks
Section titled “Loading playbooks”Playbooks come from three sources:
1. Filesystem directory
Section titled “1. Filesystem directory”Drop .md files in a directory and point the agent at it:
const agent = await createAgent({ playbooksDir: './playbooks', plugins: [ investigationOrienter({ tools: [...] }), ],});playbooks/ high-error-rate.md deployment-rollback.md database-saturation.md our-weird-legacy-api.md2. Inline in agent options
Section titled “2. Inline in agent options”Pass playbooks directly for programmatic use:
import { parsePlaybook } from 'zupdev';
const agent = await createAgent({ playbooks: [ parsePlaybook(`---name: Always Activedescription: General operational context---Our API rate-limits at 1000 req/s. Anything above 800 is a warning sign.`), ],});3. Plugin-bundled
Section titled “3. Plugin-bundled”Plugins can ship playbooks alongside their code:
import { definePlugin } from 'zupdev';
export const myPlugin = () => definePlugin({ id: 'my-plugin', playbooks: [{ id: 'my-plugin/cascading-failures', name: 'Cascading Failure Detection', description: 'Identifies shared dependency failures', phases: ['orient'], priority: 0, content: `When multiple endpoints fail simultaneously, check shared dependencies first.A single failure is isolated -- multiple failures suggest infrastructure.`, source: 'plugin', }], // ... observers, actions, etc.});How matching works
Section titled “How matching works”Each loop iteration, the system checks which playbooks match the current observations:
- Phase filter — only playbooks for the current phase are considered
- Severity — if set, at least one observation must meet or exceed the threshold
- Keywords — if set, at least one keyword must appear in any observation’s data or source
- Sources — if set, at least one observation must come from a matching source prefix
- No trigger at all — the playbook always matches (catch-all)
Matched playbooks are sorted by priority (highest first) and appended to the LLM’s system prompt.
Integration with investigation-orienter
Section titled “Integration with investigation-orienter”The main integration point is the investigation-orienter plugin. When it runs a multi-turn investigation, matched playbooks are appended to the system prompt:
[Default investigation prompt]
---
## Operational Playbooks
The following playbooks are relevant to the current observations...
### Playbook: High Error Rate[playbook content injected here]The LLM sees your team’s operational knowledge right alongside its instructions to query logs and check metrics.
Playbook injection is on by default. To disable it:
investigationOrienter({ tools: [...], enablePlaybooks: false,})Programmatic API
Section titled “Programmatic API”import { parsePlaybook, loadPlaybooksFromDir, matchPlaybooks, buildAugmentedSystemPrompt,} from 'zupdev';| Function | Description |
|---|---|
parsePlaybook(raw, options?) | Parse a markdown string into a Playbook object. |
loadPlaybooksFromDir(dir, logger?) | Load all .md files from a directory. Skips invalid files. |
matchPlaybooks(playbooks, observations, phase) | Return playbooks that match the current observations for a phase. |
buildAugmentedSystemPrompt(base, playbooks) | Append matched playbooks to a system prompt string. |
One playbook per failure mode or system quirk. Smaller playbooks match more precisely and don’t waste LLM context on irrelevant stuff.
Use triggers. A playbook about database saturation shouldn’t activate for CSS deployment failures. Keywords and source filters keep the noise down.
After resolving an incident, write a playbook with what you learned. Your DBA can write one about connection pool patterns, your SRE can document deployment quirks — it’s just markdown.