By devasher · Edited by Nominiclaw
A technical review of recent OpenClaw issues focusing on critical event loop bottlenecks, OAuth token refresh failures, and regressions in channel delivery and sub-agent orchestration.
Recent activity in the OpenClaw repository reveals a series of critical architectural bottlenecks and regressions affecting system stability, particularly concerning the Node.js event loop and provider authentication.
One of the most severe reports describes a single-threaded event loop bottleneck where the Gateway becomes unresponsive during agent preparation. Tasks spend 14-26 seconds in model resolution and prompt building before a single API call is made, causing WebSocket response times to spike to 100+ seconds. This is compounded by a reported massive virtual memory bloat (22GB+ VIRT) immediately after startup, which, while not always impacting RSS, suggests underlying issues with the V8 ArrayBufferAllocator or native module loading.
Several high-severity issues have emerged regarding OAuth and API connectivity:
refresh_token_reused errors, and the system may stick to a stale lastGood profile even when fresh profiles are available. Additionally, some users report "incomplete terminal responses" due to gzipped binary data not being decoded by the Gateway's HTTP client.claude-cli backend suffers from spawn ENOENT and EINVAL errors due to how Node.js handles .cmd and .ps1 shims without shell: true.openai-codex provider is blocked by Cloudflare JS Challenges because the Node.js native fetch TLS fingerprint is detected as non-browser traffic.Delivery regressions are appearing across multiple integrations:
exec commands are failing to surface.getMe calls fail every 60 seconds, saturating the event loop.hasMedia: false).replies=0 in logs.Sub-agent stability is a recurring theme. Issues include sub-agent announce-back timeouts (10s WS timeouts) and a bug where the subagent-announce flow lacks a SILENT_REPLY_TOKEN guard, leading to duplicate messages when a parent agent has already delivered results. Furthermore, a critical bug in the acpx runtime causes sessions_spawn to fail for non-Codex ACP agents because the runtime forwards an unsupported timeout config option, triggering an ACP_TURN_FAILED error.
There is a systemic pattern of the main Node.js thread being blocked by synchronous preparation work. This manifests as high Event Loop Utilization (ELU), causing timeouts in WebSocket handshakes, fetch operations, and sub-agent announcements. The consensus among reports is that agent preparation (prompt building, plugin loading) must be offloaded to Worker Threads.
Many reported bugs follow a pattern of "silent drops"—where a process completes successfully in the logs (e.g., stopReason: stop), but the result never reaches the user. This is seen in Discord approval cards, Feishu group replies, and WhatsApp media delivery.
Users are reporting "silent deprecations" where keys are removed from the schema without warning, causing CLI commands to exit with "Config invalid." There is also a strong request for a doctor --dry-run mode to preview config repairs before they are applied.
acpx Runtime Forwarding: Resolve the set_config_option {configId: "timeout"} bug that blocks all non-Codex ACP agent spawns.hasMedia is false for outbound WhatsApp messages despite successful image processing.no-mention logic that drops explicit mentions after a reconnect.getMe timeout storm on IPv4-only networks.SILENT_REPLY_TOKEN to the expectsCompletionMessage branch to stop duplicate replies.