Skip to content

LLM Providers

Zup wraps the Vercel AI SDK so you can use any supported LLM provider with the same interface. Supported providers include Anthropic, OpenAI, Google Gemini, Mistral, Groq, xAI, Cohere, Perplexity, Together AI, DeepInfra, Cerebras, OpenRouter, Azure OpenAI, Amazon Bedrock, Google Vertex AI, and any OpenAI-compatible endpoint.

LLM configuration is optional. Many plugins (like http-monitor) work without an LLM. Plugins that require LLM access (like investigation-orienter) will check for ctx.llm at runtime.

Set the llm field in your agent options:

import { createAgent } from 'zupdev';
const agent = await createAgent({
name: 'my-agent',
llm: {
provider: 'anthropic',
apiKey: process.env.ANTHROPIC_API_KEY!,
model: 'claude-sonnet-4-6',
},
plugins: [...],
});
FieldTypeRequiredDescription
provider'anthropic'YesSelects the Anthropic provider.
apiKeystringYesAnthropic API key.
modelstringYesModel name (e.g., 'claude-sonnet-4-6', 'claude-haiku-4-20250514').
baseURLstringNoCustom API endpoint. Useful for proxies or API gateways.
const agent = await createAgent({
name: 'my-agent',
llm: {
provider: 'openai',
apiKey: process.env.OPENAI_API_KEY!,
model: 'gpt-4o',
},
plugins: [...],
});
FieldTypeRequiredDescription
provider'openai'YesSelects the OpenAI provider.
apiKeystringYesOpenAI API key.
modelstringYesModel name (e.g., 'gpt-4o', 'gpt-4o-mini').
baseURLstringNoCustom API endpoint.
organizationstringNoOpenAI organization ID.
llm: {
provider: 'google',
apiKey: process.env.GOOGLE_API_KEY!,
model: 'gemini-2.0-flash',
}
llm: {
provider: 'mistral',
apiKey: process.env.MISTRAL_API_KEY!,
model: 'mistral-large-latest',
}
llm: {
provider: 'groq',
apiKey: process.env.GROQ_API_KEY!,
model: 'llama-3.3-70b-versatile',
}
llm: {
provider: 'xai',
apiKey: process.env.XAI_API_KEY!,
model: 'grok-2',
}
llm: {
provider: 'openrouter',
apiKey: process.env.OPENROUTER_API_KEY!,
model: 'anthropic/claude-sonnet-4',
}
llm: {
provider: 'azure',
apiKey: process.env.AZURE_API_KEY!,
model: 'gpt-4o',
resourceName: 'my-resource',
apiVersion: '2024-12-01-preview', // optional
}
llm: {
provider: 'amazon-bedrock',
model: 'anthropic.claude-sonnet-4-6-v1:0',
region: 'us-east-1',
accessKeyId: process.env.AWS_ACCESS_KEY_ID!,
secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY!,
}
llm: {
provider: 'google-vertex',
model: 'gemini-2.0-flash',
project: 'my-gcp-project',
location: 'us-central1',
}

Cohere, Perplexity, Together AI, DeepInfra, and Cerebras all follow the same simple pattern:

llm: {
provider: 'cohere', // or 'perplexity', 'togetherai', 'deepinfra', 'cerebras'
apiKey: process.env.COHERE_API_KEY!,
model: 'command-a-08-2025',
}

For any provider that exposes an OpenAI-compatible API (Ollama, vLLM, LiteLLM, etc.):

llm: {
provider: 'openai-compatible',
baseURL: 'http://localhost:11434/v1',
apiKey: 'ollama',
model: 'llama3.1',
}

When an LLM is configured, ctx.llm is populated with an LLMCapability object that provides four methods:

type LLMCapability = {
provider: LLMProvider;
config: LLMConfig;
generateText(prompt: string, options?: GenerateOptions): Promise<TextResult>;
generateStructured<T>(prompt: string, schema: ZodSchema<T>, options?: GenerateOptions): Promise<T>;
streamText(prompt: string, options?: GenerateOptions): AsyncIterable<TextChunk>;
chat(messages: ChatMessage[], options?: ChatOptions): Promise<ChatResult>;
};

The simplest usage — send a prompt, get text back.

const result = await ctx.llm.generateText(
'Summarize the current system health based on these metrics: ...',
{
temperature: 0.3,
maxTokens: 500,
system: 'You are an SRE agent analyzing system health.',
}
);
console.log(result.text); // The generated text
console.log(result.usage); // { promptTokens, completionTokens, totalTokens }
console.log(result.model); // The actual model that responded

TextResult:

type TextResult = {
text: string;
usage?: TokenUsage;
finishReason?: 'stop' | 'length' | 'content_filter' | 'tool_calls';
model?: string;
};

Use a Zod schema to get validated, typed output from the LLM. The AI SDK uses native structured output where the provider supports it (OpenAI, Google) and tool-based extraction as a fallback, with automatic Zod schema validation.

import { z } from 'zod';
const HealthSummary = z.object({
status: z.enum(['healthy', 'degraded', 'down']),
affectedServices: z.array(z.string()),
severity: z.enum(['low', 'medium', 'high', 'critical']),
recommendation: z.string(),
});
type HealthSummary = z.infer<typeof HealthSummary>;
const summary: HealthSummary = await ctx.llm.generateStructured(
'Analyze these observations and determine system health: ...',
HealthSummary,
{
temperature: 0.1, // Lower temperature for more deterministic structured output
system: 'You are an SRE agent. Respond with a structured health assessment.',
}
);
// summary is fully typed as HealthSummary
console.log(summary.status); // 'degraded'
console.log(summary.affectedServices); // ['api-gateway', 'auth-service']

If the LLM returns invalid output or the response fails Zod validation, generateStructured throws an error.

For long-running generation or real-time output, use streamText to get an async iterable of text chunks:

const stream = ctx.llm.streamText(
'Explain the root cause of this outage in detail: ...',
{
maxTokens: 2000,
system: 'You are an SRE agent performing post-incident analysis.',
}
);
for await (const chunk of stream) {
process.stdout.write(chunk.text);
if (chunk.done) {
console.log('\n--- Generation complete ---');
}
}

TextChunk:

type TextChunk = {
text: string;
done: boolean; // true on the final chunk
};

The chat method supports multi-turn conversations and LLM tool calling. This is the foundation for the investigation system.

const tools = [
{
name: 'query_logs',
description: 'Search application logs',
inputSchema: {
type: 'object',
properties: {
query: { type: 'string', description: 'Log search query' },
timeRange: { type: 'string', description: 'Time range (e.g., "1h", "30m")' },
},
required: ['query'],
},
},
{
name: 'get_metrics',
description: 'Fetch system metrics',
inputSchema: {
type: 'object',
properties: {
metric: { type: 'string', description: 'Metric name' },
period: { type: 'string', description: 'Time period' },
},
required: ['metric'],
},
},
];
const messages: ChatMessage[] = [
{ role: 'user', content: 'Investigate why the API latency spiked at 14:30 UTC.' },
];
const result = await ctx.llm.chat(messages, {
tools,
system: 'You are an SRE agent. Use the available tools to investigate.',
maxTokens: 4096,
});
// Check if the LLM wants to call tools
if (result.stopReason === 'tool_use') {
for (const toolCall of result.toolCalls) {
console.log(`Tool call: ${toolCall.name}(${JSON.stringify(toolCall.input)})`);
// Execute the tool and feed results back...
}
}

ChatMessage types:

type ChatMessage =
| { role: 'user'; content: string }
| { role: 'assistant'; content: string; toolCalls?: ToolCall[] }
| { role: 'tool'; toolCallId: string; content: string };

ChatResult:

type ChatResult = {
content: string;
toolCalls: ToolCall[];
stopReason: 'end_turn' | 'tool_use' | 'max_tokens' | 'stop_sequence';
usage?: TokenUsage;
model?: string;
};

ToolDefinition:

type ToolDefinition = {
name: string;
description: string;
inputSchema: Record<string, unknown>; // JSON Schema object
};

All generation methods accept an optional GenerateOptions object:

FieldTypeDefaultDescription
maxTokensnumber4096Maximum tokens to generate.
temperaturenumberProvider defaultSampling temperature (0-2). Lower values produce more deterministic output.
topPnumberProvider defaultTop-P / nucleus sampling.
stopstring[]Stop sequences that halt generation.
timeoutnumberRequest timeout in milliseconds.
systemstringSystem prompt prepended to the conversation.

ChatOptions extends GenerateOptions with an additional tools field:

FieldTypeDescription
toolsToolDefinition[]Tool definitions the LLM can call.

Plugins access the LLM through ctx.llm. Always check for its existence first, since LLM configuration is optional:

import { definePlugin, createOrienter } from 'zupdev';
export const myPlugin = () => definePlugin({
id: 'my-plugin',
orienters: {
analyze: createOrienter({
name: 'llm-analysis',
description: 'Use LLM to analyze observations',
orient: async (observations, ctx) => {
if (!ctx.llm) {
return {
source: 'my-plugin/analyze',
findings: ['LLM not configured -- skipping analysis'],
confidence: 0.3,
};
}
const result = await ctx.llm.generateText(
`Analyze these observations: ${JSON.stringify(observations)}`,
{ temperature: 0.2 }
);
return {
source: 'my-plugin/analyze',
findings: [result.text],
confidence: 0.8,
};
},
}),
},
});

The investigation-orienter plugin is a production example of LLM-powered orientation. It uses ctx.llm.chat with tool calling to run a multi-turn investigation loop within the Orient phase:

import { investigationOrienter } from 'zupdev/plugins/investigation-orienter';
const agent = await createAgent({
llm: {
provider: 'anthropic',
apiKey: process.env.ANTHROPIC_API_KEY!,
model: 'claude-sonnet-4-6',
},
plugins: [
investigationOrienter({
triggerSeverity: 'warning', // Only investigate warning+ observations
maxTurns: 15, // Max tool-calling rounds
tools: [
{
name: 'query_logs',
description: 'Search logs',
parameters: z.object({ query: z.string() }),
execute: async (params) => {
// Query your logging system
return JSON.stringify(results);
},
},
],
}),
],
});

The investigation orienter checks whether any observation meets the triggerSeverity threshold. If so, it builds a prompt from the observations and runs a tool-calling loop. The LLM’s final response is parsed into a SituationAssessment with extracted findings, contributing factors, and impact assessment.

If you need LLM access outside of an agent context, you can create a provider directly:

import { createLLMProvider, createLLMCapability } from 'zupdev';
// Create a raw provider
const provider = createLLMProvider({
provider: 'anthropic',
apiKey: process.env.ANTHROPIC_API_KEY!,
model: 'claude-sonnet-4-6',
});
const result = await provider.generateText('Hello, world!');
// Or create a full LLMCapability (same object that appears on ctx.llm)
const llm = createLLMCapability({
provider: 'openai',
apiKey: process.env.OPENAI_API_KEY!,
model: 'gpt-4o',
});
const structured = await llm.generateStructured('...', myZodSchema);

All methods that return TextResult or ChatResult include optional usage information:

type TokenUsage = {
promptTokens: number;
completionTokens: number;
totalTokens: number;
};

Token counts are provided by the upstream API and may not be available for all providers (especially some OpenAI-compatible ones).