Agent Runtime

The agent runtime is the execution engine at the heart of Octipus. It manages the lifecycle of autonomous agents that follow a Thought-Action-Observation loop to accomplish tasks.

Architecture Overview

The runtime consists of three main classes arranged in an inheritance hierarchy:

BaseAgentWorker (abstract)
├── AgentWorker          — LLM-based agent (OpenAI SDK / LiteLLM)
└── CLIAgentWorker       — CLI tool agent (Claude Code, Gemini CLI, Codex)

BaseAgentWorker

The abstract base class that defines the agent lifecycle:

Event emission: Publishes thought, action, observation, error, and status events
Iteration loop: Runs the think-act-observe cycle up to AGENT_MAX_ITERATIONS times
Token budget: Tracks cumulative token usage and stops agents before they exceed the per-agent limit
Timeout enforcement: Checks wall-clock time before each LLM call
Status management: Tracks agent state (running, paused, completed, failed, stopped)

AgentWorker

Extends BaseAgentWorker for standard LLM providers (Ollama, OpenAI, Anthropic, Gemini via LiteLLM):

Sends messages with tool definitions using the OpenAI SDK format
Parses tool call responses and routes them to the ToolExecutor
Handles context compaction when the conversation exceeds token limits
Supports metadata.extraBody for per-model custom parameters

CLIAgentWorker

Extends BaseAgentWorker for subscription-based CLI tools:

Spawns CLI processes (Claude Code, Gemini CLI, Codex CLI) as subprocesses
Uses CLI-specific adapters to build arguments and parse output
Falls back to the default LLM model if the CLI tool fails (quota exhaustion, crash)

Structured Expert Prompts

Each expert (preset) now has a structured system prompt that goes beyond a simple role description. When an agent is spawned, the system injects three additional prompt sections:

Critical Rules

Hard constraints the agent must follow. These are framed as non-negotiable directives specific to the expert’s domain. For example, a Security Analyst expert might include rules like “Never recommend disabling authentication” or “Always verify TLS certificates.”

Deliverable Templates

Standardized output formats the agent should produce. These ensure consistent, actionable results regardless of which model is executing. Examples include structured code review checklists, deployment runbooks, or security audit reports.

Success Metrics

Measurable criteria the agent uses to self-evaluate its work. These give the model a clear definition of “done” and help prevent premature task completion. Metrics might include “All tests pass,” “No critical vulnerabilities remain,” or “Coverage exceeds 80%.”

Tool Execution

The ToolExecutor handles all tool calls from agents:

Permission check: Validates the tool call against the three-tier permission system (ALLOW / ASK / DENY)
Secret injection: Substitutes {{secret:name}} templates in tool arguments with vault values
Execution: Delegates to the appropriate skill implementation
Error tracking: Counts consecutive failures per tool

Consecutive Failure Protection

Tools are automatically disabled after 3 consecutive failures. When this happens:

The tool is stripped from subsequent LLM requests
The model is forced to respond using available information
This prevents infinite loops where a model repeatedly calls a failing tool

Context Compaction

When a conversation exceeds the token window, the runtime applies LLM-summarized context compaction:

The current conversation history is sent to the LLM with a summarization prompt
The LLM produces a compressed summary of the conversation so far
The summary replaces the full history, freeing tokens for continued work

This allows long-running agents to work beyond the context window limit without losing important context.

Agent Lifecycle

spawn → running → [paused] → completed / failed / stopped

Status	Description
`running`	Agent is actively processing in the think-act-observe loop
`paused`	Agent is temporarily suspended (can be resumed)
`completed`	Agent finished its task successfully
`failed`	Agent encountered an unrecoverable error
`stopped`	Agent was manually stopped by a user

Configuration

Variable	Default	Description
`AGENT_MAX_TOKEN_BUDGET`	`100000`	Maximum tokens per agent (0 = unlimited)
`AGENT_DEFAULT_TIMEOUT`	`300000`	Wall-clock timeout in ms (5 minutes)
`AGENT_MAX_ITERATIONS`	`50`	Maximum think-act-observe iterations

Agent Events

Each agent emits events that can be consumed via the REST polling API or WebSocket:

Event Type	Description
`thought`	The agent’s reasoning about what to do next
`action`	A tool call being made
`observation`	The result of a tool call
`error`	An error encountered during execution
`status`	Agent status change (running, paused, completed, etc.)

Events are stored in a ring buffer (max 200 events per agent) with sequential IDs for cursor-based polling.