Voice & Phone Calls
Octipus includes voice capabilities at two levels: local voice I/O for hands-free interaction, and phone calls for making and receiving actual calls through telephony providers.
Local Voice I/O
Section titled “Local Voice I/O”Speech-to-Text (STT)
Section titled “Speech-to-Text (STT)”| Engine | Type | Description |
|---|---|---|
| Whisper.cpp | Local | C++ Whisper — fast, private, fully offline |
| Faster-Whisper | Local | Python + CTranslate2 — optimized for speed |
| OpenAI Whisper | Cloud | API fallback when local is unavailable |
API: POST /api/voice/transcribe — send base64-encoded audio, receive transcribed text.
Text-to-Speech (TTS)
Section titled “Text-to-Speech (TTS)”| Engine | Type | Description |
|---|---|---|
| Piper | Local | Neural TTS — fast, high quality, fully offline |
| Edge TTS | Cloud | Microsoft Edge TTS — 200+ neural voices |
| Coqui | Local | Neural TTS — multi-language support |
Wake Word Detection
Section titled “Wake Word Detection”| Engine | Type | Description |
|---|---|---|
| Sherpa-ONNX | Local | ONNX-based keyword spotting — fast, private |
| Picovoice/Porcupine | Cloud | High accuracy — requires API key |
| VAD Fallback | Local | Voice Activity Detection — triggers on any speech |
Phone Calls
Section titled “Phone Calls”Make and receive actual phone calls through telephony providers. Conversations are powered by the same LLM models you use for chat — with a fast, direct path optimized for voice latency.
Supported Providers
Section titled “Supported Providers”| Provider | Protocol | What You Need |
|---|---|---|
| Twilio | Programmable Voice + Media Streams | Account SID + Auth Token |
| Telnyx | Call Control v2 | API Key + Connection ID |
| Plivo | Voice API + XML | Auth ID + Auth Token |
All three providers support outbound calls, inbound calls, and webhook-based conversation flow. Phone numbers are auto-detected from your provider account.
Quick Setup
Section titled “Quick Setup”1. Store credentials in the vault
Add your provider’s API credentials in Settings > Vault:
For Twilio: twilio_account_sid and twilio_auth_token
For Telnyx: telnyx_api_key and telnyx_connection_id
For Plivo: plivo_auth_id and plivo_auth_token
2. Configure the provider
Set two settings (via Settings page or API):
| Setting | Value | Example |
|---|---|---|
voice.telephonyProvider | Your provider name | twilio |
voice.publicUrl | Your public webhook URL | https://abc123.ngrok.io |
3. Assign a model to the “voice” topic
In the Models page, assign a fast model to the voice topic. Local Ollama models give the lowest latency.
That’s it — no other configuration needed. The expert prompt is loaded from the Communicator expert automatically.
Twilio Setup Guide
Section titled “Twilio Setup Guide”- Create a Twilio account at twilio.com
- Get credentials from Twilio Console → Account → API keys & tokens:
- Account SID: starts with
AC+ 32 hex chars (34 total) - Auth Token: 32 hex chars
- Account SID: starts with
- Store in vault: Add
twilio_account_sidandtwilio_auth_tokenin Settings → Vault - Phone number: Auto-detected from your Twilio account — no manual config needed. Buy a number in the Twilio Console if you don’t have one.
- Set webhook URL: Must be publicly accessible (use ngrok or Cloudflare Tunnel)
- Configure Twilio webhook: Point your Twilio phone number’s webhook to
https://your-url/api/voice/webhook/twilio
Common Issues
Section titled “Common Issues”| Problem | Cause | Fix |
|---|---|---|
| HTTP 401 | Auth Token mismatch | Regenerate in Twilio Console → API keys & tokens |
| HTTP 403 | Account suspended, old token, or sub-account mismatch | Verify in Twilio Console |
| HTTP 404 | Invalid Account SID | Check vault secret twilio_account_sid |
| No phone number detected | Account has no numbers | Buy one in Twilio Console |
| Webhook timeout | Network/firewall | Check public URL accessibility |
Provider Comparison
Section titled “Provider Comparison”| Feature | Twilio | Telnyx | Plivo |
|---|---|---|---|
| Auth | Basic (SID:Token) | Bearer token | Basic (ID:Token) |
| Call Control | TwiML (XML) | Call Control v2 (JSON) | XML |
| Speech Gather | <Gather> | API command | <GetInput> |
| Webhook Verify | HMAC-SHA1 | Ed25519 | HMAC-SHA256 |
| End Call | POST Status=completed | POST hangup action | DELETE |
| Phone Detection | Auto from account | Manual config | Manual config |
How It Works
Section titled “How It Works”Outbound Calls — Notify Mode
Section titled “Outbound Calls — Notify Mode”The agent speaks a message and hangs up. Good for alerts, reminders, and status updates.
User: "Call +1234567890 and tell them the server is back up"→ Agent uses initiate_call tool (notify mode)→ Provider dials the number→ Person answers → TTS speaks message → HangupOutbound Calls — Conversation Mode
Section titled “Outbound Calls — Conversation Mode”Interactive voice exchange with the caller. The assistant listens, thinks, and responds — like a phone conversation.
User: "Call the client and discuss the project timeline"→ Agent uses initiate_call tool (conversation mode)→ Provider dials → Person answers → TTS speaks greeting→ Person speaks → STT transcribes → LLM responds → TTS speaks back→ Repeat until either side hangs upFast Conversation Path
Section titled “Fast Conversation Path”Voice calls bypass the orchestrator entirely for low latency:
Caller speaks → Provider STT (~1s) → Direct LLM call (~1-3s) → Provider TTS (~0.5s)~2-5 seconds per turn — no classification, no worker spawning, no tool execution. The model assigned to the voice topic is called directly with the Communicator expert’s system prompt plus voice-specific instructions.
Tool Actions
Section titled “Tool Actions”Agents interact with phone calls through 5 tool actions:
| Action | Description | Permission |
|---|---|---|
initiate_call | Start a call (notify or conversation mode) | Requires approval |
continue_call | Send next message in active conversation | Auto-approved |
end_call | Hang up an active call | Auto-approved |
get_status | Check call state | Auto-approved |
list_calls | List all active calls | Auto-approved |
The voice_call tool is assigned to the Communication role and available to all communication-focused agents.
Inbound Calls
Section titled “Inbound Calls”Inbound calls are disabled by default for security. To enable:
| Setting | Value | Description |
|---|---|---|
voice.inboundPolicy | allowlist | Only accept calls from listed numbers |
voice.inboundAllowFrom | ["+1234567890"] | Allowed caller phone numbers (E.164) |
Set voice.inboundPolicy to open to accept calls from any number (not recommended for production).
Configure your provider to send webhooks to:
https://your-public-url/api/voice/webhook/twilio(Replace twilio with telnyx or plivo as appropriate.)
API Endpoints
Section titled “API Endpoints”| Method | Path | Description |
|---|---|---|
POST | /api/voice/transcribe | Transcribe audio (local STT) |
GET | /api/voice/status | Voice subsystem status |
POST | /api/voice/webhook/:provider | Telephony webhook |
GET | /api/voice/calls | List active calls |
GET | /api/voice/telephony/health | Provider health check |
Security
Section titled “Security”- Webhook signature verification for all providers (Twilio HMAC-SHA1, Telnyx Ed25519, Plivo HMAC-SHA256)
- Permission gating — initiating calls requires user approval
- Inbound filtering — allowlist-based caller ID filtering
- Conversation isolation — each call has its own context, no cross-call leakage