Agent Runtime
The agent runtime is the execution engine at the heart of Octipus. It manages the lifecycle of autonomous agents that follow a Thought-Action-Observation loop to accomplish tasks.
Architecture Overview
Section titled “Architecture Overview”The runtime consists of three main classes arranged in an inheritance hierarchy:
BaseAgentWorker (abstract)├── AgentWorker — LLM-based agent (OpenAI SDK / LiteLLM)└── CLIAgentWorker — CLI tool agent (Claude Code, Gemini CLI, Codex)BaseAgentWorker
Section titled “BaseAgentWorker”The abstract base class that defines the agent lifecycle:
- Event emission: Publishes
thought,action,observation,error, andstatusevents - Iteration loop: Runs the think-act-observe cycle up to
AGENT_MAX_ITERATIONStimes - Token budget: Tracks cumulative token usage and stops agents before they exceed the per-agent limit
- Timeout enforcement: Checks wall-clock time before each LLM call
- Status management: Tracks agent state (
running,paused,completed,failed,stopped)
AgentWorker
Section titled “AgentWorker”Extends BaseAgentWorker for standard LLM providers (Ollama, OpenAI, Anthropic, Gemini via LiteLLM):
- Sends messages with tool definitions using the OpenAI SDK format
- Parses tool call responses and routes them to the
ToolExecutor - Handles context compaction when the conversation exceeds token limits
- Supports
metadata.extraBodyfor per-model custom parameters
CLIAgentWorker
Section titled “CLIAgentWorker”Extends BaseAgentWorker for subscription-based CLI tools:
- Spawns CLI processes (Claude Code, Gemini CLI, Codex CLI) as subprocesses
- Uses CLI-specific adapters to build arguments and parse output
- Falls back to the default LLM model if the CLI tool fails (quota exhaustion, crash)
Structured Expert Prompts
Section titled “Structured Expert Prompts”Each expert (preset) now has a structured system prompt that goes beyond a simple role description. When an agent is spawned, the system injects three additional prompt sections:
Critical Rules
Section titled “Critical Rules”Hard constraints the agent must follow. These are framed as non-negotiable directives specific to the expert’s domain. For example, a Security Analyst expert might include rules like “Never recommend disabling authentication” or “Always verify TLS certificates.”
Deliverable Templates
Section titled “Deliverable Templates”Standardized output formats the agent should produce. These ensure consistent, actionable results regardless of which model is executing. Examples include structured code review checklists, deployment runbooks, or security audit reports.
Success Metrics
Section titled “Success Metrics”Measurable criteria the agent uses to self-evaluate its work. These give the model a clear definition of “done” and help prevent premature task completion. Metrics might include “All tests pass,” “No critical vulnerabilities remain,” or “Coverage exceeds 80%.”
Tool Execution
Section titled “Tool Execution”The ToolExecutor handles all tool calls from agents:
- Permission check: Validates the tool call against the three-tier permission system (ALLOW / ASK / DENY)
- Secret injection: Substitutes
{{secret:name}}templates in tool arguments with vault values - Execution: Delegates to the appropriate skill implementation
- Error tracking: Counts consecutive failures per tool
Consecutive Failure Protection
Section titled “Consecutive Failure Protection”Tools are automatically disabled after 3 consecutive failures. When this happens:
- The tool is stripped from subsequent LLM requests
- The model is forced to respond using available information
- This prevents infinite loops where a model repeatedly calls a failing tool
Context Compaction
Section titled “Context Compaction”When a conversation exceeds the token window, the runtime applies LLM-summarized context compaction:
- The current conversation history is sent to the LLM with a summarization prompt
- The LLM produces a compressed summary of the conversation so far
- The summary replaces the full history, freeing tokens for continued work
This allows long-running agents to work beyond the context window limit without losing important context.
Agent Lifecycle
Section titled “Agent Lifecycle”spawn → running → [paused] → completed / failed / stopped| Status | Description |
|---|---|
running | Agent is actively processing in the think-act-observe loop |
paused | Agent is temporarily suspended (can be resumed) |
completed | Agent finished its task successfully |
failed | Agent encountered an unrecoverable error |
stopped | Agent was manually stopped by a user |
Configuration
Section titled “Configuration”| Variable | Default | Description |
|---|---|---|
AGENT_MAX_TOKEN_BUDGET | 100000 | Maximum tokens per agent (0 = unlimited) |
AGENT_DEFAULT_TIMEOUT | 300000 | Wall-clock timeout in ms (5 minutes) |
AGENT_MAX_ITERATIONS | 50 | Maximum think-act-observe iterations |
Agent Events
Section titled “Agent Events”Each agent emits events that can be consumed via the REST polling API or WebSocket:
| Event Type | Description |
|---|---|
thought | The agent’s reasoning about what to do next |
action | A tool call being made |
observation | The result of a tool call |
error | An error encountered during execution |
status | Agent status change (running, paused, completed, etc.) |
Events are stored in a ring buffer (max 200 events per agent) with sequential IDs for cursor-based polling.