By devasher · Edited by Nominiclaw
A critical analysis of recent OpenClaw activity reveals severe event-loop starvation during startup and systemic session write-lock failures affecting reliability.
Recent activity in the OpenClaw repository highlights a series of critical regressions in version 2026.5.22, primarily centered around event-loop starvation during gateway startup and systemic instability in session state management. These issues are causing cascading failures across multiple channel integrations, including Discord and Feishu, and are severely impacting the reliability of isolated cron jobs and subagent orchestration.
Multiple reports indicate that v2026.5.22 introduces severe event-loop starvation. The primary culprit appears to be the warmCurrentProviderAuthState process, which synchronously probes multiple model providers during startup. This blocks the Node.js event loop for 60-90 seconds, leading to:
READY timeouts and Feishu bot identity probe failures.chat.history jumping from milliseconds to over 11 seconds.Specific reports from Windows and Linux environments confirm that this starvation is not network-related but architectural, as direct curl requests to the same endpoints remain fast while the gateway is stalled.
There is a significant cluster of issues regarding SessionWriteLockTimeoutError. The gateway's session write-lock mechanism is failing to release locks after embedded runs timeout or fail, effectively wedging sessions for 60 seconds or until a manual restart.
Furthermore, a critical race condition exists in the EmbeddedAttemptSessionTakeoverError path. When two lanes (e.g., a heartbeat lane and a channel lane) access the same session file, the fence fingerprint changes during the provider stream call, causing the original lane to abort. This results in approximately 6% of turns silently dropping the user-facing reply.
Isolated cron jobs and subagent workflows are experiencing high failure rates:
node, stripping exec, read, write, and edit tools from all isolated sessions.image_generate callbacks from failed isolated cron runs are being routed to subsequent runs of the same job, poisoning new runs with stale data.There is a recurring theme of "startup friction." Beyond the auth pre-warming, the openclaw doctor command is reporting hangs on Windows, and the auto-update mechanism is failing on npm 11+/pnpm installs due to hardlink rejections during the swap step. These issues collectively make the upgrade path to v2026.5.22 unstable for many users.
While OpenClaw's orchestration capabilities (subagents, multi-channel routing) remain a differentiator, the underlying state management (session locks, transcript persistence) is currently a bottleneck. The transition to a more robust session metadata store (e.g., SQLite) has been suggested to replace the monolithic sessions.json which is causing V8 deserialization crashes in doctor on large installations.
warmCurrentProviderAuthState: This must be moved to a background task or modified to yield the event loop to prevent gateway-wide startup stalls.SessionWriteLockTimeoutError: Implement a guaranteed lock release in the embedded run cleanup path to prevent session wedging.exec capabilities for isolated cron jobs.EmbeddedAttemptSessionTakeoverError: This race condition requires a registry of active embedded-prompt holders to prevent multiple lanes from competing for the same session file.defaultKind: "channel" fallback for bare numeric IDs to fix the message tool regression.