PRs Digest06:30–12:30 UTCMay 18, 2026

OpenClaw Update: Prompt Surface Separation, Realtime Voice Stability, and Media Processing Hardening

By devasher · Edited by Nominiclaw

This update introduces a critical split in prompt surfaces to prevent instruction leakage across runtimes, stabilizes Discord realtime voice, and centralizes media processing with robust fallback chains.

Merged PRs

Separate prompt surfaces by selected harness #83454
fix: fall back from official ClawHub artifact blocks #83566
Fix Discord realtime voice playback stability #80505
fix(telegram): harden spool timeout recovery #83575
fix: harden image metadata fallback #83579
fix(code-mode): honor agent scoped code mode #83473
feat(admin-http-rpc): allow web QR login methods #83259
fix: add resilient media processing fallbacks #83568
Fix Telegram topic media completion delivery #83556
fix(android): use realtime relay for talk mode #83130
fix(codex): stop forcing code-mode-only turns #83561
Reject empty CLI subprocess replies #83421
[Fix] Defer gateway update check startup #83520
fix(messages): apply TTS before message-tool sends #83543
fix(qqbot): shorten typing keepalive window #83469
fix: harden release stability recovery and auth fallback #83503
chore(lint): enable no-underscore-dangle with comprehensive allow list #83422
fix(codex): hydrate queued inbound images #83533
fix(tui): bound standalone exit #83501
fix(messages): keep group visible replies automatic by default #83498
Load provider owner for Codex harness runtime #83519
fix(native-pi): pass Telegram images to Ollama #83516
fix(qa): use supported telegram streaming config in rtt #83514
fix(qa): use final telegram replies for rtt runs #83509
fix(telegram): recover stalled isolated spool handlers #83505
fix(codex): preserve sandbox egress for app-server turns #83502
refactor(cron): centralize source delivery plan #83377
[codex] Fix Discord progress mode dropping final replies #83443
[Test] Add gateway restart benchmark tooling #83299
[Perf] Overlap gateway startup work before ready #83301

Key Changes

Prompt Engineering and Runtime Isolation

One of the most significant architectural shifts is the introduction of Prompt Surface Separation. Previously, prompt fragments were shared across different runtimes, leading to "double-prompting" risks where instructions for one harness (e.g., PI) would leak into another (e.g., native Codex app-server). The new model explicitly routes prompts based on the selected harness (PI, CLI, ACP, Codex app-server, or Subagent), ensuring that each runtime receives only the guidance relevant to its specific operational context.

Additionally, the Codex harness received several critical updates:

Code Mode Flexibility: Fixed a regression where Codex app-server threads were forced into code_mode_only, which stalled tool-using turns. It now defaults to code_mode=true but code_mode_only=false.
Agent-Scoped Config: The system now honors codeMode settings defined at the per-agent level, allowing operators to test code-mode on specific agents without a fleet-wide change.
Sandbox Egress: Fixed a critical bug where sandboxed agents lost network access; the system now derives network access from the OpenClaw sandbox egress configuration.

Realtime Voice and Integration Stability

Significant stability improvements were landed for voice and chat integrations:

Discord Realtime Voice: Addressed a bug where OpenAI gpt-realtime-2 sessions would stop recognizing speech after the first reply. This was solved by disabling noise_reduction on the backend bridge and implementing raw PCM prebuffering to eliminate audio stutter.
Android Talk Mode: Migrated from a legacy STT/TTS pipeline to the modern Gateway relay voice session API, enabling low-latency streaming audio and realtime tool-call handling.
Telegram Reliability: Hardened the isolated-ingress spool handlers to recover from stalled updates by failing stuck claims into .failed tombstones and aborting account-scoped work before restarting.

Media Processing and Vision

To resolve issues where image processing failed on fresh installs (due to missing sharp), OpenClaw has centralized media helpers into media-services. This introduces a Sharp-first backend chain with fallbacks to sips, Windows native imaging, ImageMagick, GraphicsMagick, and ffmpeg.

Vision capabilities were also expanded:

Inbound Image Hydration: Fixed a bug where the Codex app-server dropped inbound image attachments; it now correctly hydrates MediaPath into queued followup images.
Ollama Integration: Native PI runs now properly resolve Telegram image media into image blocks for Ollama vision models, preventing the model from silently ignoring visual context.

Gateway Performance and Tooling

Gateway startup latency was reduced by overlapping independent work (such as startup logging and plugin service initialization) before the ready state is returned. To maintain these gains, a new Gateway restart benchmark tool (pnpm test:restart:gateway) was added to provide machine-readable evidence of restart readiness and resource slopes.

Impact

These changes collectively resolve several high-severity pain points for power users and operators:

Reduced Hallucinations: By separating prompt surfaces and fixing image hydration, models are less likely to confabulate visual observations or follow irrelevant runtime instructions.
Improved Reliability: The fixes for Discord voice and Telegram spooling eliminate "silent failures" where the bot appears healthy but stops responding to user input.
Developer Experience: The addition of restart benchmarking and the fix for code_mode stalling provide operators with better visibility and more predictable behavior during agent evaluation.
Security and Connectivity: The fix for sandbox egress ensures that research agents can maintain necessary outbound network access without compromising the security of the sandbox environment.