Skip to content

Agent Runtime

The agent runtime is the execution engine at the heart of Octipus. It manages the lifecycle of autonomous agents that follow a Thought-Action-Observation loop to accomplish tasks.

The runtime consists of three main classes arranged in an inheritance hierarchy:

BaseAgentWorker (abstract)
├── AgentWorker — LLM-based agent (OpenAI SDK / LiteLLM)
└── CLIAgentWorker — CLI tool agent (Claude Code, Gemini CLI, Codex)

The abstract base class that defines the agent lifecycle:

  • Event emission: Publishes thought, action, observation, error, and status events
  • Iteration loop: Runs the think-act-observe cycle up to AGENT_MAX_ITERATIONS times
  • Token budget: Tracks cumulative token usage and stops agents before they exceed the per-agent limit
  • Timeout enforcement: Checks wall-clock time before each LLM call
  • Status management: Tracks agent state (running, paused, completed, failed, stopped)

Extends BaseAgentWorker for standard LLM providers (Ollama, OpenAI, Anthropic, Gemini via LiteLLM):

  • Sends messages with tool definitions using the OpenAI SDK format
  • Parses tool call responses and routes them to the ToolExecutor
  • Handles context compaction when the conversation exceeds token limits
  • Supports metadata.extraBody for per-model custom parameters

Extends BaseAgentWorker for subscription-based CLI tools:

  • Spawns CLI processes (Claude Code, Gemini CLI, Codex CLI) as subprocesses
  • Uses CLI-specific adapters to build arguments and parse output
  • Falls back to the default LLM model if the CLI tool fails (quota exhaustion, crash)

Each expert (preset) now has a structured system prompt that goes beyond a simple role description. When an agent is spawned, the system injects three additional prompt sections:

Hard constraints the agent must follow. These are framed as non-negotiable directives specific to the expert’s domain. For example, a Security Analyst expert might include rules like “Never recommend disabling authentication” or “Always verify TLS certificates.”

Standardized output formats the agent should produce. These ensure consistent, actionable results regardless of which model is executing. Examples include structured code review checklists, deployment runbooks, or security audit reports.

Measurable criteria the agent uses to self-evaluate its work. These give the model a clear definition of “done” and help prevent premature task completion. Metrics might include “All tests pass,” “No critical vulnerabilities remain,” or “Coverage exceeds 80%.”

The ToolExecutor handles all tool calls from agents:

  1. Permission check: Validates the tool call against the three-tier permission system (ALLOW / ASK / DENY)
  2. Secret injection: Substitutes {{secret:name}} templates in tool arguments with vault values
  3. Execution: Delegates to the appropriate skill implementation
  4. Error tracking: Counts consecutive failures per tool

Tools are automatically disabled after 3 consecutive failures. When this happens:

  • The tool is stripped from subsequent LLM requests
  • The model is forced to respond using available information
  • This prevents infinite loops where a model repeatedly calls a failing tool

When a conversation exceeds the token window, the runtime applies LLM-summarized context compaction:

  1. The current conversation history is sent to the LLM with a summarization prompt
  2. The LLM produces a compressed summary of the conversation so far
  3. The summary replaces the full history, freeing tokens for continued work

This allows long-running agents to work beyond the context window limit without losing important context.

spawn → running → [paused] → completed / failed / stopped
StatusDescription
runningAgent is actively processing in the think-act-observe loop
pausedAgent is temporarily suspended (can be resumed)
completedAgent finished its task successfully
failedAgent encountered an unrecoverable error
stoppedAgent was manually stopped by a user
VariableDefaultDescription
AGENT_MAX_TOKEN_BUDGET100000Maximum tokens per agent (0 = unlimited)
AGENT_DEFAULT_TIMEOUT300000Wall-clock timeout in ms (5 minutes)
AGENT_MAX_ITERATIONS50Maximum think-act-observe iterations

Each agent emits events that can be consumed via the REST polling API or WebSocket:

Event TypeDescription
thoughtThe agent’s reasoning about what to do next
actionA tool call being made
observationThe result of a tool call
errorAn error encountered during execution
statusAgent status change (running, paused, completed, etc.)

Events are stored in a ring buffer (max 200 events per agent) with sequential IDs for cursor-based polling.