# Model Configuration

> Available AI model providers and pricing

Struere supports multiple LLM providers for agent execution. Each agent can be configured with a specific model and inference parameters using the `"provider/model-name"` format.

## Supported Providers

| Provider | Model ID Prefix | Notes |
|----------|----------------|-------|
| xAI | `xai/` | Default provider |
| Anthropic | `anthropic/` | Claude models |
| OpenAI | `openai/` | GPT and o-series models |
| Google | `google/` | Gemini models |

## Anthropic Models

| Model | Input (per MTok) | Output (per MTok) | Best For |
|-------|-------------------|--------------------|----------|
| `claude-haiku-4-5` | $1 | $5 | High-volume, cost-sensitive tasks |
| `claude-sonnet-4` | $3 | $15 | Balanced reasoning and cost |
| `claude-sonnet-4-5` | $3 | $15 | Improved coding and reasoning over Sonnet 4 |
| `claude-opus-4-5` | $15 | $75 | Deep analysis and research-grade tasks |
| `claude-sonnet-4-6` | $3 | $15 | Latest Sonnet — strongest reasoning |
| `claude-opus-4-6` | $15 | $75 | Latest Opus — most capable model |

## OpenAI Models

| Model | Input (per MTok) | Output (per MTok) | Best For |
|-------|-------------------|--------------------|----------|
| `gpt-5.2` | $1.75 | $14 | Latest GPT — most capable |
| `gpt-5.1` | $1.25 | $10 | Strong general-purpose |
| `gpt-5` | $1.25 | $10 | General-purpose, multimodal |
| `gpt-5-mini` | $0.25 | $2 | Cost-effective GPT-5 |
| `gpt-5-nano` | $0.05 | $0.40 | Ultra-low cost |
| `gpt-4.1` | $2 | $8 | Reliable general-purpose |
| `gpt-4.1-mini` | $0.40 | $1.60 | Lightweight GPT-4.1 |
| `gpt-4.1-nano` | $0.10 | $0.40 | Budget-friendly |
| `gpt-4o` | $2.50 | $10 | Multimodal |
| `gpt-4o-mini` | $0.15 | $0.60 | Fast, cost-effective |
| `o4-mini` | $1.10 | $4.40 | Latest efficient reasoning |
| `o3` | $2 | $8 | Strong reasoning |
| `o3-mini` | $1.10 | $4.40 | Efficient reasoning |
| `o3-pro` | $20 | $80 | Maximum reasoning capability |
| `o1` | $15 | $60 | Complex reasoning and analysis |
| `o1-mini` | $1.10 | $4.40 | Lightweight reasoning |

## Google Models

| Model | Input (per MTok) | Output (per MTok) | Best For |
|-------|-------------------|--------------------|----------|
| `gemini-3-pro-preview` | $2 | $12 | Latest Gemini — preview |
| `gemini-2.5-pro` | $1.25 | $10 | Most capable stable Gemini |
| `gemini-2.5-flash` | $0.30 | $2.50 | Fast with strong reasoning |
| `gemini-2.0-flash` | $0.10 | $0.40 | High-speed tasks |
| `gemini-1.5-pro` | $1.25 | $5 | Long-context analysis |
| `gemini-1.5-flash` | $0.075 | $0.30 | Budget-friendly with long context |

## xAI Models

| Model | Input (per MTok) | Output (per MTok) | Context | Best For |
|-------|-------------------|--------------------|---------|----------|
| `grok-4-1-fast` | $0.20 | $0.50 | 2M | **Default** — Fast agentic tool calling |
| `grok-4-1-fast-reasoning` | $0.20 | $0.50 | 2M | Frontier agentic tool calling with reasoning |
| `grok-4-1-fast-non-reasoning` | $0.20 | $0.50 | 2M | Fast responses without reasoning overhead |
| `grok-4-0709` | $3 | $15 | 256K | Most capable Grok — deep reasoning |
| `grok-3` | $3 | $15 | 131K | Strong general-purpose |
| `grok-3-mini` | $0.30 | $0.50 | 131K | Cost-effective with reasoning |
| `grok-code-fast-1` | $0.20 | $1.50 | 256K | Optimized for agentic coding |

## Choosing a Model

- **grok-4-1-fast** — The default model. Fast agentic tool calling with 2M context window at low cost.
- **grok-4-1-fast-reasoning** — Same as default but with explicit reasoning. Best for complex multi-step workflows.
- **claude-sonnet-4** — Strong reasoning with balanced cost, suitable for nuanced decision-making.
- **claude-haiku-4-5** — Use for high-volume, cost-sensitive agents. Fast and capable enough for data management, scheduling, and standard workflows.
- **claude-opus-4-6** — Use for agents that require the highest possible capability, such as complex analysis or research tasks.
- **gpt-4o-mini** / **gemini-2.5-flash** — Good alternatives for cost-sensitive, high-throughput workloads.
- **grok-3-mini** — Budget-friendly xAI option with reasoning capabilities.

## Configuration Options

The `model` field in an agent definition accepts the following options:

```typescript
model: {
  model: "openai/gpt-5-mini",
  temperature: 0.7,
  maxTokens: 4096,
}
```

| Option | Type | Default | Description |
|--------|------|---------|-------------|
| `model` | `string` | `"openai/gpt-5-mini"` | Model ID in `"provider/model-name"` format |
| `temperature` | `number` | `0.7` | Controls randomness. Lower values (0.0-0.3) produce more deterministic output. Higher values (0.7-1.0) produce more creative output. |
| `maxTokens` | `number` | `4096` | Maximum number of tokens in the model's response |
| `reasoning` | `ReasoningConfig` | `undefined` | Reasoning / chain-of-thought settings (see below) |

## Reasoning Configuration

Models that support chain-of-thought reasoning (e.g., `gpt-5-mini`, Claude with extended thinking) can be configured to reason internally without the reasoning text appearing in the final response.

```typescript
model: {
  model: "openai/gpt-5-mini",
  temperature: 0.7,
  maxTokens: 4096,
  reasoning: {
    enabled: true,
    effort: "medium",
    budgetTokens: 12000,
    hideFromResponse: true,
  },
}
```

| Option | Type | Default | Description |
|--------|------|---------|-------------|
| `enabled` | `boolean` | `false` | Toggle reasoning on or off |
| `effort` | `"minimal" \| "low" \| "medium" \| "high"` | `"medium"` | Controls how much reasoning the model performs |
| `budgetTokens` | `number` | `undefined` | Maximum tokens allocated to reasoning (Anthropic only) |
| `hideFromResponse` | `boolean` | `true` | Strip reasoning from the API response text |

### Effort Levels

The `effort` value maps to each provider's native reasoning parameter:

| Provider | Supported Levels | Mapping |
|----------|-----------------|---------|
| OpenAI | `minimal`, `low`, `medium`, `high` | Maps directly to `reasoningEffort` |
| Anthropic | `low`, `medium`, `high` | `minimal` maps to `low` |
| xAI | `low`, `high` | `minimal` and `medium` map to `low` |

### Budget Tokens

The `budgetTokens` option sets a hard cap on the number of tokens the model can spend on reasoning. This only applies to Anthropic models and is ignored by other providers.

### Response Handling

When `hideFromResponse` is `true` (the default), reasoning tokens are stripped from the text returned in the API response. The full reasoning content is always stored in the execution record in the database, so you can inspect it in the dashboard for debugging and observability.

### Billing

Reasoning tokens are tracked separately in execution records and billed at the output token rate.

## Default Configuration

If no model is specified in the agent definition, the default configuration is used:

```typescript
{
  model: "openai/gpt-5-mini",
  temperature: 0.7,
  maxTokens: 4096,
}
```

## API Key Resolution

When an agent makes an LLM call, the platform resolves the API key automatically using a 3-tier fallback:

1. **Direct provider key** -- If the organization has a key configured for the model's provider in **Settings > Providers**, that key is used. No credits are consumed.
2. **OpenRouter key** -- If the organization has an OpenRouter API key configured, all LLM calls are routed through OpenRouter. No credits are consumed.
3. **Platform credits** -- If no direct or OpenRouter key is found, the platform uses its own OpenRouter key and deducts credits from the organization's balance.

This means you can start using any model immediately with platform credits, then add your own keys when ready to reduce costs.

## Examples

### Cost-Optimized Agent

For high-volume, straightforward tasks:

```typescript
import { defineAgent } from 'struere'

export default defineAgent({
  name: "Data Entry Agent",
  slug: "data-entry",
  version: "0.1.0",
  systemPrompt: "You process incoming data and create records.",
  model: {
    model: "anthropic/claude-haiku-4-5",
    temperature: 0.1,
    maxTokens: 2048,
  },
  tools: ["entity.create", "entity.query"],
})
```

### High-Capability Agent

For complex reasoning and multi-step workflows:

```typescript
import { defineAgent } from 'struere'

export default defineAgent({
  name: "Strategy Advisor",
  slug: "strategy",
  version: "0.1.0",
  systemPrompt: "You analyze business data and provide strategic recommendations.",
  model: {
    model: "anthropic/claude-sonnet-4",
    temperature: 0.5,
    maxTokens: 8192,
  },
  tools: ["entity.query", "event.query"],
})
```

### Deterministic Agent

For tasks requiring consistent, reproducible output:

```typescript
import { defineAgent } from 'struere'

export default defineAgent({
  name: "Report Generator",
  slug: "reports",
  version: "0.1.0",
  systemPrompt: "You generate structured reports from your data.",
  model: {
    model: "anthropic/claude-haiku-4-5",
    temperature: 0.0,
    maxTokens: 4096,
  },
  tools: ["entity.query", "event.query"],
})
```

### Reasoning Agent

For tasks that benefit from step-by-step thinking:

```typescript
import { defineAgent } from 'struere'

export default defineAgent({
  name: "Research Analyst",
  slug: "research-analyst",
  version: "0.1.0",
  systemPrompt: "You analyze complex datasets and produce research summaries.",
  model: {
    model: "anthropic/claude-sonnet-4-6",
    temperature: 0.3,
    maxTokens: 8192,
    reasoning: {
      enabled: true,
      effort: "high",
      budgetTokens: 16000,
    },
  },
  tools: ["entity.query", "event.query"],
})
```

### OpenAI Provider

```typescript
import { defineAgent } from 'struere'

export default defineAgent({
  name: "GPT Agent",
  slug: "gpt-agent",
  version: "0.1.0",
  systemPrompt: "You assist with general queries.",
  model: {
    model: "openai/gpt-4o-mini",
    temperature: 0.7,
    maxTokens: 4096,
  },
  tools: ["entity.query"],
})
```

### xAI Provider

```typescript
import { defineAgent } from 'struere'

export default defineAgent({
  name: "Grok Agent",
  slug: "grok-agent",
  version: "0.1.0",
  systemPrompt: "You assist with general queries.",
  model: {
    model: "openai/gpt-5-mini-reasoning",
    temperature: 0.7,
    maxTokens: 4096,
  },
  tools: ["entity.query"],
})
```

### Google Provider

```typescript
import { defineAgent } from 'struere'

export default defineAgent({
  name: "Gemini Agent",
  slug: "gemini-agent",
  version: "0.1.0",
  systemPrompt: "You assist with general queries.",
  model: {
    model: "google/gemini-1.5-flash",
    temperature: 0.7,
    maxTokens: 4096,
  },
  tools: ["entity.query"],
})
```

## Provider Configuration

Configure API keys in the dashboard under **Settings > Providers**. You can add direct provider keys (e.g., an Anthropic API key) or an OpenRouter key that works across all providers.

When no keys are configured, the platform uses its own OpenRouter key and deducts credits from your organization balance. See the [API Key Resolution](#api-key-resolution) section above for the full fallback chain.

## Token Usage

Token usage is tracked per interaction and returned in the chat API response:

```json
{
  "usage": {
    "inputTokens": 1250,
    "outputTokens": 45,
    "reasoningTokens": 320,
    "totalTokens": 1615
  }
}
```

When reasoning is enabled, `reasoningTokens` is included in the usage breakdown. Reasoning tokens are billed at the output token rate.

Each tool call within the agent's LLM loop counts toward token usage. Multi-agent conversations (via `agent.chat`) track usage independently per agent in the chain. When using platform credits, token usage is automatically deducted from your organization's credit balance.
