Qwen2.5-3B Iterative CX LAM (Real Browser + VLM)

This project is a fully local, real-world Large Action Model (LAM) for customer experience (CX) and CRM workflows. Built on a fine-tuned Qwen2.5-3B-Instruct-4bit (via MLX + LoRA on Apple Silicon), it generates and executes structured sequences of CRM tool calls—primarily crm.screenpop, crm.create_case, and crm.log_call—directly against a live browser-based CRM UI using Playwright. Unlike mock or simulated systems, every action (clicks, fills, events) occurs in the real rendered DOM, with state grounded in the actual application backend. Visual perception is provided by a real VLM (mlx-vlm on screenshots) that describes live screen content (contacts, cases, UI elements), which is injected into prompts alongside episodic memory.

The system runs as a closed-loop iterative agent. It performs one action per model call, receives real execution results plus fresh VLM visual context and relevant memory traces (dense embeddings + full prior step histories with reasoning), then reasons explicitly before the next action. A pure policy layer ensures logical ordering (e.g., cases must exist before logging). It supports hybrid outputs (tools + natural dialogue), stores complete traceable sessions (including screenshots and reasoning), and has been validated at 100% success on representative scenarios with correct final state (cases + logs created). The entire stack—model inference, vision, memory, browser execution, and data flywheel—is local, traceable, and designed for self-improvement through real traces.

This is a research / proof-of-concept release. It demonstrates end-to-end local LAM construction with real browser + VLM grounding.

Model Details

Base: Qwen2.5-3B-Instruct (4-bit mlx-community version)
Method: LoRA (rank 16 in final run)
Training: 500 iterations on Apple M4 Max on iterative step-by-step data (maximized with real VLM traces)
Adapter size: ~26.6 MB
Training data: Iterative step-by-step expansions derived from full trajectories + real execution traces with VLM visuals and reasoning
Total trajectories generated: 2,346 (from two public sources) + ongoing real traces

Datasets Used

Trajectories were synthesized from:

bitext/Bitext-customer-support-llm-chatbot-training-dataset (1,996 trajs)
knkarthick/dialogsum (350 trajs, used as transcript-style proxy)

See the companion dataset: chendren/cx-lam-trajectories

Model repo: chendren/qwen2.5-3b-cx-lam (3B base + LoRA)

How the LAM Works (100% Real Browser + VLM)

We use a narrow contract with two prompt styles:

Initial call: Uses the classic full-plan prefix ending at ### Execute\n[
Continuation / adaptation steps: Uses build_continuation_prompt that surfaces:
- Current observation
- Real visual context from VLM (mlx-vlm description of live screenshot of the rendered CRM UI)
- Memory summaries (past episodes with full traces)
- "### Completed so far"
- "### Reasoning and Next Action\n["
- Policy ensures create_case before log_call

The runtime loop (run_lam_session / --iterative):

RealBrowserExecutor takes screenshot and runs real VLM (mlx-vlm) for pixel-level description.
Model (with memory) proposes one action + explicit reasoning.
RealBrowserExecutor (Playwright) performs real clicks/fills on the live UI → gets results from the actual rendered page.
Results + fresh VLM visual + memory are fed back.
Repeat until log_call succeeds with proper state or the model emits nothing.

No Python mocks or internal state simulation. The browser + backend hold the truth. Vision = real VLM on screenshots. Actions = real DOM interactions. Full traces (with VLM descs + reasoning) are stored for training and retrieval.

Evaluation / Test Results (Post-Enhancements)

100 Model Tests (single-shot on held-out test set, n=100)

valid_rate: 1.0 (valid + ≥3 actions)
parse_success_rate: 1.0
model_drove_rate: 1.0
avg_actions_per_example: 3.0
all_valid: True

100% of generations produced valid, model-driven, multi-step (exactly 3) structured action sequences.

2 Full Real Adaptive Runs (exact `--adaptive --iterative --vision --memory --hybrid`)

Priya renewal seed: terminal=True, steps=3, cases_created=1, logs_created=1, full trace written with VLM visuals. exit code 0.
Alex billing seed: terminal=True, steps=3, cases_created=1, logs_created=1, full trace written with VLM visuals + policy override used. exit code 0.

Both runs used real browser + real VLM per step + dense memory + explicit reasoning, and completed correctly.

100 Full Adaptive Loop Tests (real loop with VLM/memory/policy/trace)

full_success_rate (terminal + cases>=1 + logs>=1): 1.0
terminal_rate: 1.0
avg_steps: 3.0
All produced complete traceable sessions.

See scripts/validate_lam.py, 100_adaptive_tests.json and full logs for details. Real VLM descriptions and policy enforcement were critical to 100% success.

Download & Tracking

Hugging Face only counts downloads when a query file (by default config.json, or a library-specific file) is requested via the Hub.

This repo now includes a root config.json so that full downloads are tracked.

Use these methods to ensure your downloads are counted:

# CLI (full snapshot)
huggingface-cli download chendren/qwen2.5-3b-cx-lam --local-dir qwen-lam

# Python (recommended - fetches config.json + adapters)
from huggingface_hub import snapshot_download
model_dir = snapshot_download("chendren/qwen2.5-3b-cx-lam")
# Now contains: config.json, adapter_config.json, adapters.safetensors, ...

# Or via the LAM package helper (guarantees tracking + convenient):
from lam.inference import ensure_lam_downloaded
lam_dir = ensure_lam_downloaded("chendren/qwen2.5-3b-cx-lam")  # hits config.json


**Note:** Direct raw downloads or `git clone` may not increment the counter. Use `snapshot_download` / `huggingface-cli` for accurate tracking. The badge above reflects tracked downloads.

To bootstrap / force visible download stats (owner runs can be throttled), use the included helper:
```bash
python scripts/force_download_hits.py --hits 10

This repeatedly hits config.json (the primary query file) + other metadata via the official client.

For even better native support for all MLX models (including custom adapters), a small addition to the Hub's library registry is proposed (see mlx-library-registration.patch in this repo). This registers library_name: mlx with explicit countDownloads query on config + adapter files.

Load the model (MLX + LoRA)

from mlx_lm import load
from huggingface_hub import snapshot_download

model_dir = snapshot_download("chendren/qwen2.5-3b-cx-lam")
model, tokenizer = load(
    "mlx-community/Qwen2.5-3B-Instruct-4bit",
    adapter_path=f"{model_dir}/adapters.safetensors"
)

Usage (100% Real Browser + VLM)

from lam.inference import run_lam_session

sess = run_lam_session(
    "Customer Jordan Lee asks about renewal and add-on pricing for the contract",
    one_action_per_step=True,
    max_steps=6,
    use_vision=True,
    use_memory=True,
    hybrid_dialogue=True,
    # hf_repo="chendren/qwen2.5-3b-cx-lam",  # uncomment to auto-download via snapshot (ensures tracking via config.json)
)

print("Steps:", sess["num_steps"], "terminal:", sess["terminal"])
for s in sess["steps"]:
    print("  Actions:", s["actions"])
    print("  Visual (VLM):", s.get("visual_context", "")[:150])
    if s.get("reasoning"):
        print("  Reasoning:", s["reasoning"][:80])

CLI (real browser + VLM):

PYTHONPATH=. python3 scripts/lam_infer.py --adaptive --iterative --vision --memory --hybrid \
  "Customer Jordan Lee (VIP) asks about renewal and add-on pricing"

To watch the browser:

... --no-headless

Server must be running (node server.js).

Sequence Diagram: One Full Adaptive Test (Priya Renewal Seed)

This is the exact flow for one representative test (Priya Patel renewal discussion + add-on pricing):

PNG version also rendered:

(The diagrams above show the detailed flow for one representative test run. Source: diagrams/one_test_sequence.mmd)

sequenceDiagram
    autonumber
    participant User as User/CLI
    participant Infer as lam_infer.py
    participant Sess as run_lam_session
    participant Exec as RealBrowserExecutor
    participant VLM as VLM (mlx-vlm)
    participant Mem as EpisodeMemory
    participant Pol as session_policy
    participant Mod as Model (Qwen + LoRA)
    participant FS as Filesystem (traces/)

    User->>Infer: python scripts/lam_infer.py --adaptive --iterative --vision --memory --hybrid "Priya renewal..."
    Infer->>Sess: run_lam_session(obs, use_vision=True, use_memory=True, ...)

    Note over Sess,Exec: Preload (once per process)
    Sess->>Exec: RealBrowserExecutor(reuse=True)
    Sess->>VLM: preload model
    Sess->>Mem: get_memory()

    loop Until terminal (may_terminate)
        Sess->>Exec: get_visual_context()
        Exec->>Exec: page.screenshot()
        Exec->>VLM: describe_screenshot(png)
        VLM-->>Exec: VLM_REAL_IMAGE_DESC: CONTACTS: ... | UI_STATE: ...
        Exec-->>Sess: visual_ctx + screenshot_path

        Sess->>Mem: get_summary_for_prompt() + get_rich_examples_for_prompt()
        Mem-->>Sess: mem_summary + "Past similar traces (use as guide...)"
        Sess->>Sess: build_continuation_prompt(obs + visual + mem + reasoning + completed)

        Sess->>Mod: generate_lam_action(continuation_prompt)
        Mod-->>Sess: { "reasoning": "...", "execute": [ {tool: "crm.screenpop", ...} ] }

        Sess->>Pol: may_terminate(steps, results)? or override early log_call
        Pol-->>Sess: actions (or forced create_case)

        Sess->>Exec: execute_sequence(actions)
        Exec->>Exec: real Playwright (fill, click, wait)
        Exec-->>Sess: results (ok, performed, etc.)

        Sess->>Sess: _generate_dialogue(actions, results)
        Sess->>Sess: append step_rec (visual, reasoning, dialogue, results)
        Sess->>Mem: add_episode(..., trace=steps, extra={reasoning, visual})

        Sess->>Pol: may_terminate(steps)?
        alt yes
            Sess->>Sess: break
        end

        Sess->>Sess: build_adaptation_observation (for next)
        Sess->>Exec: get_visual_context()  (fresh VLM)
        Sess->>Mem: get_summary...
    end

    Sess->>Pol: count_final_state(steps)
    Pol-->>Sess: {"cases_created": 1, "logs_created": 1}

    Sess->>FS: write lam_trace_....json (full steps + VLM + reasoning)
    Sess-->>Infer: sess dict

    Infer->>Infer: print("=== LAM CLOSED-LOOP SESSION ===")
    Infer->>Infer: for each step: print OBS, VISUAL(VLM), MEMORY, MODEL ACTIONS, DIALOGUE, EXEC RESULTS
    Infer->>Infer: print Final state, FULL TRACE STORED, exit code: 0

    Infer-->>User: terminal=True, cases=1, logs=1, trace path

Key real components exercised in this test:

Real VLM on every step (screenshots + describe)
Dense memory + rich trace examples injected
Policy override / may_terminate
Real Playwright execution + results feedback
Full trace written with VLM + reasoning
100% success: 3 steps, 1 case + 1 log, terminal

Limitations

Base model (3B) limits long-horizon planning on complex or novel cases.
Requires local CRM server + Playwright + MLX models (VLM ~2-4GB, LAM).
VLM descriptions can be concise; full richness comes from the loop + memory.
Research artifact. Not production-ready without larger base model + more real traces.

Citation / Related

Trained on public CX data + self-generated real traces. See companion dataset.

Trained and evaluated entirely locally on Apple Silicon with MLX + real browser + VLM.

Key components: lam/inference.py:run_lam_session, lam/executor.py, lam/vision.py, lam/session_policy.py, scripts/capture_verif.sh

Downloads last month: 93

MLX

Hardware compatibility

Quantized

Model tree for chendren/qwen2.5-3b-cx-lam

Base model

Qwen/Qwen2.5-3B

Finetuned

mlx-community/Qwen2.5-3B-Instruct-4bit

Adapter

(1)

this model