Model Management
Octipus’s model management system provides a unified interface for working with multiple LLM providers, with automatic routing, failover, cost tracking, and health monitoring.
Provider Types
Section titled “Provider Types”Direct Providers
Section titled “Direct Providers”Native integrations that communicate directly with the provider API:
| Provider | Description |
|---|---|
| Ollama | Local LLM inference — auto-detected from OLLAMA_URL |
| OpenAI | OpenAI API — API key stored in vault |
| Anthropic | Anthropic API — API key stored in vault |
| Gemini | Google Gemini API — API key stored in vault |
| OpenRouter | Access 200+ models through a single API — credit-based billing, automatic routing |
CLI Providers
Section titled “CLI Providers”Run models through their vendor CLI as a subprocess. Octipus spawns the CLI, streams in/out, and surfaces the output as a normal agent turn.
| Provider | Description |
|---|---|
| Claude Code | Anthropic’s Claude via the Claude Code CLI |
| Gemini CLI | Google’s Gemini via the Gemini CLI |
| Codex CLI | OpenAI’s Codex via the Codex CLI |
CLI providers are auto-detected and registered when the binary is on PATH.
LiteLLM Proxy
Section titled “LiteLLM Proxy”An optional unified proxy that serves as a catch-all fallback. LiteLLM can route to 100+ providers through a single OpenAI-compatible API.
LITELLM_URL=http://localhost:4000Provider Router
Section titled “Provider Router”The provider router tries models in priority order:
- CLI models (Claude Code, Gemini CLI, Codex) — billing and quota handled by each vendor
- Ollama — local models
- OpenAI — cloud API
- Anthropic — cloud API
- Gemini — cloud API
- Grok — cloud API
- DeepSeek — cloud API
- OpenRouter — multi-model proxy with credit tracking
- Voyage — embeddings only
- Custom OpenAI-compatible — self-hosted vLLM, Together, Fireworks, etc. (DB-configured per model)
- Custom Gemini-compatible — Gemini API-compatible upstreams (DB-configured per model)
- LiteLLM — catch-all proxy (optional, only if configured)
Topic Routing
Section titled “Topic Routing”Models must be explicitly assigned to topics. When an agent is spawned for a role, it uses the model assigned to that role’s topic.
| Concept | Description |
|---|---|
| Topic | A category like coding, research, general |
| Primary | The preferred model for a topic |
| Backup | The fallback model if the primary is unavailable |
| Unbound topic | If no model is assigned, the agent fails to spawn with a clear error message — there is no silent fallback to the default model |
Configure topic routing through the Models page in the web UI or the API. Every role/topic combination you use must have at least a primary model assigned.
Live model discovery
Section titled “Live model discovery”Octipus discovers available models directly from each provider’s list endpoint. There is no static “recommended models” file in the repo — what you see in the picker is whatever the vendor returned on the most recent fetch, run through deterministic curation rules.
| Provider | Endpoint | Discovery file |
|---|---|---|
| OpenAI | GET /v1/models | src/models/providers/discovery/openai.ts |
| Anthropic | GET /v1/models (with anthropic-version) | src/models/providers/discovery/anthropic.ts |
| Google Gemini | GET /v1beta/models | src/models/providers/discovery/gemini.ts |
| OpenRouter | GET /api/v1/models | src/models/providers/discovery/openrouter.ts |
| Ollama | GET /api/tags | src/models/providers/discovery/ollama.ts |
Curation rules
Section titled “Curation rules”Each fresh fetch is filtered and sorted by src/models/providers/discovery/curation.ts:
- Recency window — drop models older than ~18 months (when the API exposes a date)
- Capability gate — drop models the API marks as non-tool-capable (OpenRouter); drop embedding/Whisper/TTS/DALL·E/Imagen/Veo/AQA by id pattern
- Family deduplication — collapse dated snapshots (
claude-sonnet-4-5-20250929) to their alias (claude-sonnet-4-5) when the alias is also returned - Preview filter — hide preview/experimental/dated unless
?preview=trueis passed - Tier inference — group results into
flagship/balanced/reasoning/cheapfrom id heuristics - Sort — flagship → balanced → reasoning → cheap, then
createdAtdesc
Cached for 6h in Valkey; stale-while-revalidate on errors. Force a refresh with ?refresh=true.
Custom Providers
Section titled “Custom Providers”For endpoints that aren’t backed by a first-party provider class, Octipus offers two custom-provider flavors — pick the one that matches the upstream wire format:
| Flavor | provider value | Wire format | Use for |
|---|---|---|---|
| Custom OpenAI-compatible | custom-openai | OpenAI /v1/chat/completions | vLLM, Together, Groq, Fireworks, DeepInfra, internal OpenAI-shaped proxies |
| Custom Gemini-compatible | custom-gemini | Native Google Gemini (candidates[].content.parts[]) | Vertex AI, Google AI Studio (native), Gemini-fronting proxies |
Configuration is per model, not per provider — the endpoint URL and key reference live on each model row, so you can register several different upstreams side by side. Each model carries its own apiKeyRef (a vault entry name, or an env:VAR_NAME reference), so there is no single shared key.
- Secrets page → add a vault entry (e.g.
together_api_key) with the upstream’s bearer token. (Or skip this and reference an env var directly withapiKeyRef: 'env:TOGETHER_API_KEY'.) - Models page → Add Model with:
- Provider:
custom-openaiorcustom-gemini - Endpoint URL: the base URL (no trailing slash) — include
/v1if an OpenAI-compatible upstream expects it (e.g.https://api.together.xyz/v1,http://my-vllm:8000/v1) - Model ID: the model name the upstream uses (e.g.
meta-llama/Llama-3.3-70B-Instruct-Turbo) - API key reference: the vault entry name or
env:VAR_NAMEfor this model - Auth scheme:
bearer(default),header(custom header name), orquery(query param)
- Provider:
The Test button validates connectivity against the configured endpoint and auth scheme.
See Custom Providers for the full schema — auth schemes, apiKeyRef resolution order, Gemini request envelopes, and tool-calling support.
Model Registry
Section titled “Model Registry”All model configurations are stored in the database (not environment variables):
- Default model: One model is marked as the default for unrouted messages
- Per-model settings: Enable/disable, provider, topic roles, custom parameters
- Extra body parameters: Per-model custom parameters via
metadata.extraBody(e.g.,{ think: false }for Qwen3)
Model Test Endpoint
Section titled “Model Test Endpoint”Before registering a model, you can validate connectivity:
POST /api/models/testThis endpoint checks LiteLLM first, then direct Ollama, and supports namespaced model IDs.
Cost Tracking
Section titled “Cost Tracking”The system tracks per-model token costs:
- Input tokens: Tokens sent to the model
- Output tokens: Tokens generated by the model
- Cost calculation: Based on per-model pricing configuration
- Aggregation: Costs aggregated by model, time period, and user
Quota Tracking
Section titled “Quota Tracking”Valkey-backed daily usage tracking prevents exceeding provider limits:
- Daily quotas: Track usage per model per day
- Auto-clearing: Quotas reset automatically at the start of each day
- Exhaustion detection: When a model’s quota is exhausted, the router automatically falls back to the next available model
Health Checks
Section titled “Health Checks”Periodic health monitoring for all configured providers:
- Latency measurement: Tracks response times for each provider
- Availability status: Marks providers as healthy or unhealthy
- Auto-recovery: Unhealthy providers are re-checked periodically and restored when available
Access health status via:
GET /api/models/healthPer-Provider Rate Limiting
Section titled “Per-Provider Rate Limiting”Each provider has independent rate limiting with adaptive concurrency and circuit breaker protection. This prevents overloading providers and handles transient failures gracefully.
Supported Providers
Section titled “Supported Providers”Rate limiting is configured per-provider for:
- Ollama — local inference (concurrency limited by GPU memory)
- OpenAI — cloud API (tokens-per-minute and requests-per-minute)
- Anthropic — cloud API (requests-per-minute)
- Gemini — cloud API (requests-per-minute)
- DeepSeek — cloud API (requests-per-minute)
- OpenRouter — multi-model proxy (credit-based limits)
- LiteLLM — proxy (inherits downstream limits)
Adaptive Concurrency
Section titled “Adaptive Concurrency”The rate limiter dynamically adjusts the number of concurrent requests based on provider response times:
- Scale up: When responses are fast, concurrency increases to maximize throughput
- Scale down: When latency rises or errors occur, concurrency decreases to reduce pressure
- Per-provider: Each provider has its own concurrency window
Circuit Breaker
Section titled “Circuit Breaker”When a provider experiences repeated failures, the circuit breaker trips to prevent cascading issues:
| State | Behavior |
|---|---|
| Closed | Normal operation — requests flow through |
| Open | Provider is failing — requests are immediately rejected and routed to fallback |
| Half-Open | After a cooldown period, a single test request is sent to check recovery |
The circuit breaker transitions back to Closed once the provider responds successfully in the half-open state. This integrates with the provider router’s failover logic to automatically route requests to healthy providers.
Configuration
Section titled “Configuration”Environment Variables
Section titled “Environment Variables”LITELLM_URL=http://localhost:4000 # LiteLLM proxy (optional)OLLAMA_URL=http://localhost:11434 # Local Ollama (optional)OPENROUTER_API_KEY=sk-or-... # OpenRouter API key (optional)OpenRouter Setup
Section titled “OpenRouter Setup”OpenRouter provides access to 200+ models from multiple providers through a single API key:
- Create an account at openrouter.ai and add credits
- Store your API key in the vault as
openrouter_api_keyor setOPENROUTER_API_KEY - Register models with
provider: 'openrouter'andmodelIdinprovider/modelformat (e.g.,minimax/minimax-01,nvidia/llama-3.1-nemotron-ultra-253b-v1) - Use the OpenRouter model search in the web UI to browse and register available models
Useful API Endpoints
Section titled “Useful API Endpoints”| Method | Endpoint | Description |
|---|---|---|
GET | /api/models | List all models |
POST | /api/models | Register a new model |
POST | /api/models/test | Test model connectivity |
POST | /api/models/:id/default | Set as default model |
GET | /api/models/routing | View topic routing |
GET | /api/models/health | Provider health status |
GET | /api/models/cli/status | CLI tool availability |
GET | /api/models/cli/quota | CLI quota status |