arxiv:2606.17182

Verified Detection and Prevention of Concurrency Anomalies in Multi-Agent Large Language Model Systems

Published on Jun 15

· Submitted by

Authors:

Abstract

Multi-agent LLM systems with shared state are analyzed through formal methods identifying concurrency anomalies and establishing a verified consistency hierarchy with mechanized proofs of soundness and completeness.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

Multi-agent LLM systems share state through memory stores, vector indices, and tool registries. We model such sharing as long-running read-generate-write operations under deterministic-generation semantics -- the regime durable-execution engines enforce by deterministic replay -- and formalize four concurrency anomalies in TLA+: stale-generation, phantom-tool, causal-cascade, and tool-effect reordering, structural analogues of classical isolation anomalies, each with a TLC counter-example. The exclusion lattice over these anomalies is trivial; the contribution is the mechanically verified realizability and strict separation of one maximal chain within it, L_0 subsetneq cdots subsetneq L_4, to our knowledge the first machine-checked consistency hierarchy for such runtimes. A development of 274 Verus obligations (zero assume, zero admit; trust base: two structural axioms and a mutex correspondence) proves the detectors sound and complete against the specifications and each runtime its avoidance set. Three deployed Rust runtimes realize L0-L1 (pessimistic locking, serializable snapshot isolation, default-SI), each verified against stale-generation and refined to its state machine; L2-L4 are exec-mode-verified with dependency-free prevention twins (A3, A6, A2: 0/1000 versus 1000/1000), and L2 is run live across three model families (A3 prevented in all 120 retracted sessions). We reproduce a silent lost update in ByteDance's deer-flow, formalizing its fix as a verified L_0 to L_1 refinement, and exhibit tool-effect reordering in LangGraph's ToolNode on unmodified output, removed by an L3 commit-order sequencer. The verified detector, refinements, and realizability artifacts are the contribution; the phenomena and lattice are classical.

View arXiv page View PDF GitHub 0 Add to collection

Community

sajjadanwar0

Paper submitter about 2 hours ago

TL;DR — A machine-checked consistency hierarchy for multi-agent LLM runtimes, with two reproduced in-the-wild bugs and a deployed Rust runtime.

Multi-agent LLM systems share state through memory stores, vector indices, and tool registries. This paper models that sharing as long-running read–generate–write operations and formalizes four concurrency anomalies in TLA+ — stale-generation, phantom-tool, causal-cascade, and tool-effect reordering — as structural analogues of classical database-isolation anomalies, each with a TLC counter-example.

What is actually verified and built:

A strictly-separated chain L0 ⊊ L1 ⊊ L2 ⊊ L3 ⊊ L4 over those anomalies. 274 Verus obligations (0 assume, 0 admit; trust base = two structural axioms + a mutex correspondence) prove the detectors sound and complete and each runtime its avoidance set, plus a TLAPS-checked chain coherence (15 obligations) and an A1 generation-phase lower bound (28 obligations).
Three deployed Rust runtimes at L0–L1 (pessimistic locking, serializable snapshot isolation, default-SI), each verified against stale-generation. L2 is run live across three model families — A3 prevented in all 120 supervisor-retracted sessions (0/120).
Two reproduced in-the-wild findings: a silent lost update in ByteDance's deer-flow (formalized as a verified L0→L1 fix), and tool-effect reordering in LangGraph's ToolNode on unmodified output (removed by an L3 commit-order sequencer).
Prevention costs are bounded, not catastrophic: SSI within sampling noise on one workload; pessimistic locking 1.6–2.3× on the contended cell.

Scope, stated plainly: the anomalies and the lattice are classical. The contribution is the mechanized realizability and strict separation, the executable runtimes, and a fully reproducible artifact (Verus + TLA+ + GenMC, the Rust runtime, and the Python harnesses). The supplementary appendix (§§ A–F, Tables S1–S6) is attached as an arXiv ancillary file.

Paper: https://arxiv.org/abs/2606.17182
Code & specs: https://github.com/sajjadanwar0/mac-consistency

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2606.17182

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.17182 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.17182 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.17182 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.