A review of recent OpenClaw activity reveals critical regressions in tool execution and Codex runtime stability, alongside a surge in requests for advanced multi-agent coordination and cost governance.
The recent window of activity in the OpenClaw repository highlights a period of significant architectural tension. While the project continues to expand its multi-agent capabilities, several high-severity regressions in the core execution engine—particularly concerning tool calls and the Codex runtime—have emerged, threatening the stability of automated workflows.
Simultaneously, there is a clear trend toward "production-grade" requirements. Contributors are increasingly requesting deterministic cost governance, structured agent handoffs, and better observability for long-running tasks, signaling a shift from experimental agent use to deployed automation.
Open Issues
Critical Regressions & Stability
Several issues point to a breakdown in the reliability of tool execution and runtime stability:
- Codex Runtime Stalls: Issue #83109 reports a critical regression where Codex-runtime agents stall indefinitely during tool-using turns. This is attributed to hardcoded
features.code_mode_only: true flags in the @openclaw/codex plugin, which force a synthetic JS-eval tool that fails to trigger the necessary task_complete events.
- Tool Call Hangs: Issue #83546 describes a regression in v2026.5.12 where tool outputs frequently hang in WebChat, specifically when tools produce large outputs. This is compounded by reports of
commands.log stopping entirely after gateway restarts.
- Codex Dynamic Tooling: Issue #83474 highlights sessions getting stuck in
blocked_tool_call state even after successful execution of dynamic bash commands in the Codex harness.
- Event Loop Degradation: Multiple reports (#82936, #77115) indicate severe event-loop stalls and high CPU usage under subagent load, with some cases seeing P99 delays of over 12 seconds, leading to CLI timeouts and SIGKILLs.
Multi-Agent & Orchestration Gaps
As users deploy more complex agent swarms, the limitations of the current hierarchical delegation model have become apparent:
- Silent Spawn Failures: Issue #83557 reveals that ad-hoc subagent spawns on OpenAI GPT models fail silently if any
thinking level other than off is requested.
- Information Silos: A comprehensive RFC (#35203) proposes a "Multi-Agent Collaboration Stack" to solve the problem of isolated workspaces. The proposal suggests a shared "Blackboard" for discoveries and a layered memory system (Private/Team/Global) to prevent redundant research.
- Handoff Fragility: Issue #33478 argues that the current
REPLY_SKIP logic for agent-to-agent handoffs is too fragile, as any conversational chatter from the LLM (e.g., "Success!") kills the internal announce loop.
Infrastructure & Security
- Sandbox Escapes: Issue #17931 points out a security gap where skill directories are copied into writable sandbox workspaces, allowing agents to potentially modify their own instructions.
- SSRF Risks: Issue #38931 requests a "confirm" mode for private network access to balance the need for local NAS/router management with the risk of malicious internal scanning.
- Auth Regressions: Issue #83558 reports that the
device-code authentication method for OpenAI Codex was dropped in v2026.5.12, blocking headless VPS installs.
Key Themes
1. The "Production-Grade" Shift
There is a recurring theme of moving away from "best-effort" AI behavior toward deterministic control. This is evident in requests for:
- Cost Governance: Requests for per-turn model overrides (#83565) and global token budgets (#35203) to prevent "token runaway" in multi-agent loops.
- Deterministic Execution: Proposals for a
payload.kind = "exec" for cron jobs (#18160) to bypass the LLM entirely for simple scripts.
- Observability: A strong demand for human-readable live progress logs (#83441) to replace the need for parsing raw trajectory JSONL files.
2. Modality Expansion
Users are pushing the boundaries of what agents can "sense" and "do":
- Audio Integration: Requests to treat audio files as multimodal attachments (#35835) rather than raw binary text.
- Native Search: A push to leverage the free native web search capabilities of Gemini and GLM (#17925) instead of relying on paid third-party APIs.
3. UX Refinement for Power Users
As the toolset grows, the UI is lagging. Key requests include a persistent "Active Agent" indicator in the dashboard (#30861) and better conversation management/categorization in WebChat (#27526).
Action Required
Immediate Attention (P0/P1)
- Fix Codex Runtime Flags: Resolve the hardcoded
code_mode_only flags in @openclaw/codex to restore tool-using capabilities for Codex agents (#83109).
- Address Event Loop Stalls: Investigate the diagnostic event dispatch path to prevent the gateway from starving the main event loop during concurrent agent bursts (#82936).
- Restore Device-Code Auth: Re-implement the
device-code flow for Codex to unblock headless installations (#83558).
Blocked or High-Severity
- Subagent Spawn Logic: Fix the silent failure of OpenAI-family subagent spawns when reasoning is enabled (#83557).
- Sandbox Security: Implement read-only bind mounts for skill directories to prevent agent self-modification (#17931).
- WebChat Hangs: Diagnose the I/O or streaming issue causing tool outputs to stall in v2026.5.12 (#83546).