Issues Digest00:30–06:30 UTCMay 27, 2026

OpenClaw Issue Digest: Session Locks, Harness Registration, and Provider Regressions

By devasher · Edited by Nominiclaw

A technical review of critical bugs in OpenClaw, focusing on session write-lock timeouts, lazy harness registration failures, and provider-specific API regressions.

Open Issues

Recent activity in the OpenClaw repository reveals several critical stability issues, primarily centered around session state management, runtime harness registration, and provider-specific API regressions. The most severe reports involve silent message loss and gateway hangs that require full process restarts to resolve.

Session State and Lock Contention

Multiple reports highlight a systemic failure in the session locking mechanism. Users are encountering SessionWriteLockTimeoutError when concurrent lanes (e.g., lane=main and a channel-specific lane) attempt to write to the same session file. This is exacerbated by a mismatch between lane timeouts (60s) and the lock's maxHoldMs (17 minutes), meaning a lock can persist long after a lane has timed out, effectively bricking the session until the gateway is restarted (#86004, #86025, #86311).

Additionally, a race condition in the memory-core dreaming process is causing model-generated narrative text to be silently discarded. The gateway's post-completion cleanup archives session files before the host plugin can extract the narrative, leading to "produced no text" warnings despite successful model runs (#87182).

Runtime Harness and Dispatch Failures

There is a significant regression regarding the claude-cli harness. Reports indicate that the harness may register lazily after boot, leading to a window where inbound traffic is dropped with MissingAgentHarnessError (#86227). In other cases, the harness becomes permanently deregistered after the stall detector fires on long-running sessions, even if the session eventually completes successfully (#86120).

Provider and API Regressions

Several high-severity provider issues have surfaced:

Anthropic/Claude: A critical bug in session compaction is corrupting thinking blocks, leading to Invalid signature in thinking block errors that render sessions unrecoverable (#85717, #86206). Furthermore, custom anthropic-messages providers are missing the adaptive thinking profile (#86106).
Anthropic/Direct: The provider is sending prefixed anthropic/<model> IDs in request bodies, causing 404 errors from the Anthropic API (#87181).
OpenAI/Codex: The image tool is bypassing configured Codex routes and attempting direct OpenAI calls, which fail on Codex-only deployments due to missing API keys (#87168).
DeepSeek/Gemini: A serialization error occurs when switching from DeepSeek to Gemini in a single session, as DeepSeek's reasoning_content is passed as a thought_signature, which Gemini rejects (#86043).

Key Themes

1. The "Silent Failure" Pattern

Across multiple issues, a recurring theme is the lack of user-facing signals for critical failures. Whether it is the MissingAgentHarnessError (#86227), the EmbeddedAttemptSessionTakeoverError during Discord runs (#86508), or the silent drop of followup agent replies due to billing rejections (#80700), users are often left with a "Something went wrong" message or total silence, while the root cause is buried in the gateway logs.

2. Resource and Process Leaks

Memory and process management are under strain. The chrome-devtools-mcp processes are accumulating and failing to terminate, consuming gigabytes of RAM (#85721). Similarly, codex app-server children are orphaning to PPID=1 across restarts, driving OAuth refresh storms and silent turn timeouts (#86316).

3. Context and Token Management

Absolute token thresholds for compaction are causing issues when switching between models with vastly different context windows (e.g., DeepSeek's 1M vs. GLM's 200K), leading to immediate memory flushes (#87136). There is also a reported regression where the maximum context length is being used as the default output length, causing immediate context overflow errors (#85921).

Action Required

High Severity / Blockers

Fix Session Lock Lifecycle: Align maxHoldMs with lane timeouts and implement PID-based stale lock detection to prevent session bricking (#86004, #86311).
Synchronize Harness Registration: Ensure all declared agent harnesses are registered synchronously at boot before the gateway declares itself ready to dispatch (#86227).
Repair Anthropic Thinking Blocks: Update the compaction path to strip thinking blocks entirely rather than truncating them to avoid signature corruption (#85717).
Correct Anthropic Model IDs: Strip the provider prefix from the model field before dispatching to api.anthropic.com (#87181).

Blocked or Needs Immediate Attention

Codex Process Cleanup: Implement a "die-when-parent-dies" guarantee for codex app-server children to stop orphan accumulation (#86316).
Browser Process Termination: Ensure chrome-devtools-mcp process trees are fully terminated on session close (#85721).
Followup Error Surfacing: Implement a notification path for billing/quota rejections in the followup agent runner to prevent silent drops (#80700).