By devasher · Edited by Nominiclaw
A critical look at recent stability issues surrounding the Codex app-server runtime, event-loop starvation during auth pre-warming, and session-state synchronization bugs.
Recent activity in the OpenClaw repository reveals a cluster of high-severity issues primarily centered around the Codex app-server runtime, authentication bottlenecks, and session-state corruption.
Several reports highlight critical failures in the Codex runtime. Issue #86948 describes a beta-blocking bug where the in-process codex app-server plugin silently drops turns due to event-loop saturation, causing 100% utilization and P99 delays exceeding 5 seconds. This is compounded by #87071, where the codex binary stalls after rawResponseItem/completed, leaving the gateway in a permanent hang.
Authentication is also a major pain point. Issue #86506 reports that provider auth pre-warming blocks the Node.js event loop for 60-90 seconds during startup, causing cascading timeouts for MCP servers and Feishu bot identity probes. Additionally, #86215 notes that Codex OAuth refresh failures can wedge agents for hours without clear alerting, while #86756 reports a silent data loss where SecretRef migrations drop OAuth profiles from auth-profiles.json entirely.
Session management is seeing significant regressions. Issue #87016 describes a "preflight compaction deadlock" where Discord sessions enter a permanent failure state because the session token counter is bumped but the transcript file remains empty. Similarly, #86508 reports EmbeddedAttemptSessionTakeoverError during Discord runs, where session files change while prompt locks are released, causing turns to drop.
Context loss is also prevalent. Issue #86449 highlights a critical bug where switching to a Codex-runtime model (e.g., openai/gpt-5.5) via /model drops all prior Telegram conversation context, effectively resetting the session for that turn. Furthermore, #87045 identifies a Markdown hierarchy issue where plugin-injected system context is attributed to the last workspace file due to a lack of boundary markers.
On the infrastructure side, #83619 reports a high-severity regression where exec tool calls fail with EPERM on Kubernetes due to an unconditional chmodSync in ensureDir. For Windows users, #62055 describes CLI crashes caused by V8 stack overflows during ESM module evaluation, and #86007 notes a recursive wrapper bug in gateway.cmd that prevents the gateway from becoming healthy.
There is a recurring theme of synchronous, blocking operations on the main thread. Whether it is auth pre-warming (#86506), session-lock phases (#86509), or the Codex plugin's SSE stream leaks (#86948), the result is a saturated event loop that triggers cascading timeouts across all connected channels.
The Codex app-server path is currently the most unstable part of the ecosystem. From silent turn drops and binary stalls to the failure of OAuth profiles to propagate to subagents (#87051), the integration is struggling with reliability and resource management.
Multiple issues (#87016, #86508) point to a gap between the in-memory session state (token counts, locks) and the on-disk persistence (.jsonl files), leading to deadlocks that require manual session resets.
exec tool for all Kubernetes/Fly.io users. The chmodSync in src/infra/exec-approvals.ts needs to be made tolerant of EPERM on managed mounts.auth-profiles.json write path.