Model Management
Octipus’s model management system provides a unified interface for working with multiple LLM providers, with automatic routing, failover, cost tracking, and health monitoring.
Provider Types
Section titled “Provider Types”Direct Providers
Section titled “Direct Providers”Native integrations that communicate directly with the provider API:
| Provider | Description |
|---|---|
| Ollama | Local LLM inference — auto-detected from OLLAMA_URL |
| OpenAI | OpenAI API — API key stored in vault |
| Anthropic | Anthropic API — API key stored in vault |
| Gemini | Google Gemini API — API key stored in vault |
| OpenRouter | Access 200+ models through a single API — credit-based billing, automatic routing |
CLI Providers
Section titled “CLI Providers”Free subscription-based models via CLI tools:
| Provider | Description |
|---|---|
| Claude Code | Anthropic’s Claude via the Claude Code CLI |
| Gemini CLI | Google’s Gemini via the Gemini CLI |
| Codex CLI | OpenAI’s Codex via the Codex CLI |
CLI providers are automatically detected and registered when available on the system.
LiteLLM Proxy
Section titled “LiteLLM Proxy”An optional unified proxy that serves as a catch-all fallback. LiteLLM can route to 100+ providers through a single OpenAI-compatible API.
LITELLM_URL=http://localhost:4000Provider Router
Section titled “Provider Router”The provider router tries models in priority order:
- CLI models — free subscription-based (Claude Code, Gemini CLI, Codex)
- Ollama — local models
- OpenAI — cloud API
- Anthropic — cloud API
- Gemini — cloud API
- OpenRouter — multi-model proxy with credit tracking
- LiteLLM — catch-all proxy
Topic Routing
Section titled “Topic Routing”Models can be assigned to topics with primary and backup roles for automatic failover:
| Concept | Description |
|---|---|
| Topic | A category like coding, analysis, general |
| Primary | The preferred model for a topic |
| Backup | The fallback model if the primary is unavailable |
Configure topic routing through the Models page in the web UI or the API.
Model Registry
Section titled “Model Registry”All model configurations are stored in the database (not environment variables):
- Default model: One model is marked as the default for unrouted messages
- Per-model settings: Enable/disable, provider, topic roles, custom parameters
- Extra body parameters: Per-model custom parameters via
metadata.extraBody(e.g.,{ think: false }for Qwen3)
Model Test Endpoint
Section titled “Model Test Endpoint”Before registering a model, you can validate connectivity:
POST /api/models/testThis endpoint checks LiteLLM first, then direct Ollama, and supports namespaced model IDs.
Cost Tracking
Section titled “Cost Tracking”The system tracks per-model token costs:
- Input tokens: Tokens sent to the model
- Output tokens: Tokens generated by the model
- Cost calculation: Based on per-model pricing configuration
- Aggregation: Costs aggregated by model, time period, and user
Quota Tracking
Section titled “Quota Tracking”Redis-backed daily usage tracking prevents exceeding provider limits:
- Daily quotas: Track usage per model per day
- Auto-clearing: Quotas reset automatically at the start of each day
- Exhaustion detection: When a model’s quota is exhausted, the router automatically falls back to the next available model
Health Checks
Section titled “Health Checks”Periodic health monitoring for all configured providers:
- Latency measurement: Tracks response times for each provider
- Availability status: Marks providers as healthy or unhealthy
- Auto-recovery: Unhealthy providers are re-checked periodically and restored when available
Access health status via:
GET /api/models/healthPer-Provider Rate Limiting
Section titled “Per-Provider Rate Limiting”Each provider has independent rate limiting with adaptive concurrency and circuit breaker protection. This prevents overloading providers and handles transient failures gracefully.
Supported Providers
Section titled “Supported Providers”Rate limiting is configured per-provider for:
- Ollama — local inference (concurrency limited by GPU memory)
- OpenAI — cloud API (tokens-per-minute and requests-per-minute)
- Anthropic — cloud API (requests-per-minute)
- Gemini — cloud API (requests-per-minute)
- DeepSeek — cloud API (requests-per-minute)
- OpenRouter — multi-model proxy (credit-based limits)
- LiteLLM — proxy (inherits downstream limits)
Adaptive Concurrency
Section titled “Adaptive Concurrency”The rate limiter dynamically adjusts the number of concurrent requests based on provider response times:
- Scale up: When responses are fast, concurrency increases to maximize throughput
- Scale down: When latency rises or errors occur, concurrency decreases to reduce pressure
- Per-provider: Each provider has its own concurrency window
Circuit Breaker
Section titled “Circuit Breaker”When a provider experiences repeated failures, the circuit breaker trips to prevent cascading issues:
| State | Behavior |
|---|---|
| Closed | Normal operation — requests flow through |
| Open | Provider is failing — requests are immediately rejected and routed to fallback |
| Half-Open | After a cooldown period, a single test request is sent to check recovery |
The circuit breaker transitions back to Closed once the provider responds successfully in the half-open state. This integrates with the provider router’s failover logic to automatically route requests to healthy providers.
Configuration
Section titled “Configuration”Environment Variables
Section titled “Environment Variables”LITELLM_URL=http://localhost:4000 # LiteLLM proxy (optional)OLLAMA_URL=http://localhost:11434 # Local Ollama (optional)OPENROUTER_API_KEY=sk-or-... # OpenRouter API key (optional)OpenRouter Setup
Section titled “OpenRouter Setup”OpenRouter provides access to 200+ models from multiple providers through a single API key:
- Create an account at openrouter.ai and add credits
- Store your API key in the vault as
openrouter_api_keyor setOPENROUTER_API_KEY - Register models with
provider: 'openrouter'andmodelIdinprovider/modelformat (e.g.,minimax/minimax-01,nvidia/llama-3.1-nemotron-ultra-253b-v1) - Use the OpenRouter model search in the web UI to browse and register available models
Useful API Endpoints
Section titled “Useful API Endpoints”| Method | Endpoint | Description |
|---|---|---|
GET | /api/models | List all models |
POST | /api/models | Register a new model |
POST | /api/models/test | Test model connectivity |
POST | /api/models/:id/default | Set as default model |
GET | /api/models/routing | View topic routing |
GET | /api/models/health | Provider health status |
GET | /api/models/cli/status | CLI tool availability |
GET | /api/models/cli/quota | CLI quota status |