# Voice Cookbook

> Worked patterns for orchestrator + voice agent, with prompt templates and failure-mode triage.

A worked-pattern guide for shipping voice agents. Uses the sportistics callup project as the running example: a volleyball club where players cancel via WhatsApp and a voice agent calls a substitute.

**The pattern in three sentences:** an orchestrator agent receives WhatsApp from a player, looks up the player and their next match, and if the player cancels, calls a voice agent via `agent.chat` with the match context already packaged. The voice agent runs inside `voice.call`, lives on the phone, and picks a replacement candidate. It runs silent setup, greets once, and branches through confirmation without re-greeting.

## 1. Architecture: orchestrator vs voice session

The orchestrator and the voice agent do different jobs and run in different runtimes. Keeping them separate is what makes the system debuggable.

| Role | Runtime | Job |
|------|---------|-----|
| Orchestrator | Text — `agent.chat` / HTTP `/v1/chat` | Decide WHO to call and what context they need |
| Voice agent | Inside `voice.call` — OpenAI Realtime over Twilio Media Streams | Talk to a human on the phone |

You have two structural options:

**Option A: two separate agents** (recommended). One slug for the orchestrator (e.g. `whatsapp-callup`), one for the voice agent (e.g. `voice-suplente`). The orchestrator invokes the voice agent via `agent.chat`. Each agent has a focused system prompt, a focused tool set, and is tested independently.

**Option B: one agent with a two-mode prompt.** A single agent slug whose system prompt has a `MODO 1 — WhatsApp` block and a `MODO 2 — Voice session` block. The agent reads `{{threadContext.channel}}` to pick the mode. Simpler to deploy but harder to reason about and harder to swap models per mode.

This cookbook uses Option A.

## 2. The orchestrator agent (worked example)

The orchestrator runs over WhatsApp. It owns lookup, decision, and handoff. Slug: `whatsapp-callup`. Tools: `get_player_by_phone`, `list_matches`, `set_availability`, `whatsapp.send`, `agent.chat`.

```
Eres el bot de citaciones de {{organizationName}}, un club de voley. Atiendes mensajes inbound por WhatsApp de los jugadores...

Flujo (sigue en orden, nunca saltes pasos):
1. Extrae el numero de telefono del mensaje (formato E.164, comienza con "+"). Si el thread context tiene phone, usalo.
2. Llama a get_player_by_phone({ phone }). Si player es null -> responde por whatsapp.send "No te tengo registrado, avisa al coach" y termina.
3. Llama a list_matches({ status: "scheduled" }). Toma el primer partido (el mas cercano por fecha asc). Guarda su id como matchId.
4. Parsea el mensaje natural del jugador a available | unavailable | maybe.
5. Llama a set_availability({ matchId, playerId, value }).
6. Confirma por whatsapp.send.

Si value === "unavailable", DESPUES de paso 6, llama a agent.chat({ agentSlug: "voice-suplente", message: "El jugador <player.name> cancelo para el partido <matchId> (<match.date> vs <match.opponent>). Llama a un suplente activo y confirmalo." }).

Reglas:
- NUNCA inventes ids. Siempre obten matchId via list_matches y playerId via get_player_by_phone.
- No saltes pasos.
```

The load-bearing line is the `agent.chat` invocation in step 6. The orchestrator threads `matchId`, `player.name`, `match.date`, and `match.opponent` into the message string passed to the voice agent. This is the only way the voice agent will know the context — voice sessions do not inherit the caller thread's context, channel params, or scratchpad. Whatever you don't put in that message string is lost.

## 3. Why agentSlug is mandatory in voice.call

When an agent calls `voice.call` without `agentSlug`, the tool returns `status: "success"` and the orchestrator's chat looks fine. The human on the phone, meanwhile, hears a confused vanilla model with no script — because the voice runtime had no agent to load.

**Wrong:**

```
voice.call({ phoneNumber: '+1XXX...' })
```

**Right:**

```
voice.call({ phoneNumber: '+1XXX...', agentSlug: 'voice-suplente' })
```

The agent's system prompt MUST tell the LLM to pass `agentSlug` literally. Don't rely on tool descriptions — write the slug into the prompt template alongside the example.

As of CLI v0.14.8, `bunx struere sync` rejects agents that use `voice.call` without referencing `agentSlug` (or any known agent slug) in their system prompt. If your sync passes but calls still sound wrong, regenerate the CLI.

## 4. Prompt structure for voice agents

A voice agent prompt needs four things in order: silent setup, greeting (once), branches, and a never-re-greet rule. This template is the prompt for `voice-suplente` (sanitized):

```
MODO 2 — Voice session (estas dentro de una llamada activa):

PASO 0 — Setup silencioso (antes de hablar):
- Llama a list_matches({ status: 'scheduled' }) y guarda opponent, date.

PASO 1 — Saludo (UNA SOLA VEZ, no se repite nunca):
"Hola, soy el bot del coach. Tenemos un partido el <date> contra <opponent> y necesitamos un suplente. ¿Podes jugar?"

PASO 2+ — Responde turno por turno SIN repetir el saludo. Ramas:
- Confirmacion: "Buenisimo, te confirmo." -> fin.
- Negativa: "Entendido, gracias." -> fin.
- Pregunta sobre el partido: responde con opponent y date que ya tenes, despues "¿Podes vos?".
- Respuesta confusa: una sola repregunta "¿Si o no?", luego decide.

REGLAS CRITICAS:
- NUNCA repitas el saludo del Paso 1 despues de la primera vez.
- Una oracion por turno maximo.
- No menciones matchId ni IDs.
```

Why each piece matters:

- **Silent setup (PASO 0)** avoids dead air. OpenAI Realtime defaults to `tool_choice: "auto"`, so any tool call mid-sentence becomes audible latency. Pre-fetch everything before the greeting.
- **Greet once.** Realtime models can drift back to "first turn" state when fed a confused or partial input from the human. Without an explicit never-re-greet rule, the agent will start "Hola, soy el bot del coach..." again two turns in.
- **Single-sentence branches.** Long voice responses get interrupted, which causes the model to retry from a stale state. Keep turns short.
- **No IDs in speech.** Voice agents that read out `matchId` on the phone sound robotic and lose the human. Strip them before speaking.

## 5. Threading match context (the key skill)

The orchestrator's job is to package context so the voice agent can act without thinking. Look again at the message `whatsapp-callup` sends to `voice-suplente`:

> "El jugador Diego Soto cancelo para el partido <matchId> (2026-05-12 vs Tigres). Llama a un suplente activo y confirmalo."

Three things are pre-resolved in that string: who cancelled, the match date, the opponent. The voice agent then runs PASO 0 to enrich (e.g. fetch the substitute candidate), calls `voice.call` with the candidate's number, and greets with the date and opponent already in hand.

Without this packaging, the voice agent would have to reason about which match is being discussed. Realtime models under voice latency pressure tend to hallucinate plausible-sounding rivals and dates when forced to reason mid-call. Pre-package everything you can in the orchestrator.

## 6. Failure-mode triage

| Symptom | Likely cause | Fix |
|---------|--------------|-----|
| Caller hears a confused vanilla model | `voice.call` invoked without `agentSlug` | Add `agentSlug: "voice-suplente"` to the system prompt example. CLI v0.14.8+ blocks sync if missing. |
| Voice agent says wrong opponent or date | Orchestrator didn't thread match context into `agent.chat` message | Include `matchId`, `match.date`, `match.opponent` literally in the message string. |
| Voice agent re-greets after every confused input | Missing "never re-greet" rule in prompt | Add the rule plus a fallback "¿Si o no?" pattern. |
| Long silence at the start of a call | Agent is calling tools mid-greeting | Move tool calls into a `PASO 0 — Setup silencioso` block before the greeting. |
| `Phone number is already connected` | Stale `voiceConnections` orphan from a prior setup | `bunx struere integration twilio --remove-phone <number>` (CLI v0.14.7+). |
| `Sync failed: voiceConfig.auditorAgent references unknown agent: undefined` | Docs say `auditorAgent` is optional but runtime requires it | Set it explicitly. For single-agent setups, self-reference is fine: `auditorAgent: 'voice-suplente'`. |

## 7. Inspecting --json output

`bunx struere chat <agent> --json` returns the full response including `_executionMeta.toolCallSummary` (which tools ran, in what order, with timing) and `errorCount` / `permissionDenialCount`. This is the first place to look when an orchestrator-side bug breaks voice handoff.

For voice specifically, the orchestrator's chat shows `voice.call` returned success — but the actual call quality lives in voice-gateway logs and the resulting `threads` row on the voice side. Voice transcripts are not surfaced in the orchestrator's response. Debug live calls by listening to the call in real time, or by inspecting `threads` rows with `channel: "voice"` after the call ends.

## 8. See also

- See [Voice Integration](/integrations/voice) for setup
- See [Routers](/sdk/define-router) if you need multi-agent voice routing
- See [Platform Gotchas](/platform/gotchas) for adjacent silent failures