Instructions to use chendren/qwen2.5-3b-cx-lam with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use chendren/qwen2.5-3b-cx-lam with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # if on a CUDA device, also pip install mlx[cuda] # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("chendren/qwen2.5-3b-cx-lam") prompt = "Once upon a time in" text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
- MLX LM
How to use chendren/qwen2.5-3b-cx-lam with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Generate some text mlx_lm.generate --model "chendren/qwen2.5-3b-cx-lam" --prompt "Once upon a time"
- Qwen2.5-3B Iterative CX LAM (Real Browser + VLM)
Qwen2.5-3B Iterative CX LAM (Real Browser + VLM)
This project is a fully local, real-world Large Action Model (LAM) for customer experience (CX) and CRM workflows. Built on a fine-tuned Qwen2.5-3B-Instruct-4bit (via MLX + LoRA on Apple Silicon), it generates and executes structured sequences of CRM tool callsโprimarily crm.screenpop, crm.create_case, and crm.log_callโdirectly against a live browser-based CRM UI using Playwright. Unlike mock or simulated systems, every action (clicks, fills, events) occurs in the real rendered DOM, with state grounded in the actual application backend. Visual perception is provided by a real VLM (mlx-vlm on screenshots) that describes live screen content (contacts, cases, UI elements), which is injected into prompts alongside episodic memory.
The system runs as a closed-loop iterative agent. It performs one action per model call, receives real execution results plus fresh VLM visual context and relevant memory traces (dense embeddings + full prior step histories with reasoning), then reasons explicitly before the next action. A pure policy layer ensures logical ordering (e.g., cases must exist before logging). It supports hybrid outputs (tools + natural dialogue), stores complete traceable sessions (including screenshots and reasoning), and has been validated at 100% success on representative scenarios with correct final state (cases + logs created). The entire stackโmodel inference, vision, memory, browser execution, and data flywheelโis local, traceable, and designed for self-improvement through real traces.
This is a research / proof-of-concept release. It demonstrates end-to-end local LAM construction with real browser + VLM grounding.
Model Details
- Base: Qwen2.5-3B-Instruct (4-bit mlx-community version)
- Method: LoRA (rank 16 in final run)
- Training: 500 iterations on Apple M4 Max on iterative step-by-step data (maximized with real VLM traces)
- Adapter size: ~26.6 MB
- Training data: Iterative step-by-step expansions derived from full trajectories + real execution traces with VLM visuals and reasoning
- Total trajectories generated: 2,346 (from two public sources) + ongoing real traces
Datasets Used
Trajectories were synthesized from:
- bitext/Bitext-customer-support-llm-chatbot-training-dataset (1,996 trajs)
- knkarthick/dialogsum (350 trajs, used as transcript-style proxy)
See the companion dataset: chendren/cx-lam-trajectories
Model repo: chendren/qwen2.5-3b-cx-lam (3B base + LoRA)
How the LAM Works (100% Real Browser + VLM)
We use a narrow contract with two prompt styles:
- Initial call: Uses the classic full-plan prefix ending at
### Execute\n[ - Continuation / adaptation steps: Uses
build_continuation_promptthat surfaces:- Current observation
- Real visual context from VLM (mlx-vlm description of live screenshot of the rendered CRM UI)
- Memory summaries (past episodes with full traces)
- "### Completed so far"
- "### Reasoning and Next Action\n["
- Policy ensures create_case before log_call
The runtime loop (run_lam_session / --iterative):
- RealBrowserExecutor takes screenshot and runs real VLM (mlx-vlm) for pixel-level description.
- Model (with memory) proposes one action + explicit reasoning.
- RealBrowserExecutor (Playwright) performs real clicks/fills on the live UI โ gets results from the actual rendered page.
- Results + fresh VLM visual + memory are fed back.
- Repeat until
log_callsucceeds with proper state or the model emits nothing.
No Python mocks or internal state simulation. The browser + backend hold the truth. Vision = real VLM on screenshots. Actions = real DOM interactions. Full traces (with VLM descs + reasoning) are stored for training and retrieval.
Evaluation / Test Results (Post-Enhancements)
100 Model Tests (single-shot on held-out test set, n=100)
- valid_rate: 1.0 (valid + โฅ3 actions)
- parse_success_rate: 1.0
- model_drove_rate: 1.0
- avg_actions_per_example: 3.0
- all_valid: True
100% of generations produced valid, model-driven, multi-step (exactly 3) structured action sequences.
2 Full Real Adaptive Runs (exact --adaptive --iterative --vision --memory --hybrid)
- Priya renewal seed: terminal=True, steps=3, cases_created=1, logs_created=1, full trace written with VLM visuals. exit code 0.
- Alex billing seed: terminal=True, steps=3, cases_created=1, logs_created=1, full trace written with VLM visuals + policy override used. exit code 0.
Both runs used real browser + real VLM per step + dense memory + explicit reasoning, and completed correctly.
100 Full Adaptive Loop Tests (real loop with VLM/memory/policy/trace)
- full_success_rate (terminal + cases>=1 + logs>=1): 1.0
- terminal_rate: 1.0
- avg_steps: 3.0
- All produced complete traceable sessions.
See scripts/validate_lam.py, 100_adaptive_tests.json and full logs for details. Real VLM descriptions and policy enforcement were critical to 100% success.
Download & Tracking
Hugging Face only counts downloads when a query file (by default config.json, or a library-specific file) is requested via the Hub.
This repo now includes a root config.json so that full downloads are tracked.
Use these methods to ensure your downloads are counted:
# CLI (full snapshot)
huggingface-cli download chendren/qwen2.5-3b-cx-lam --local-dir qwen-lam
# Python (recommended - fetches config.json + adapters)
from huggingface_hub import snapshot_download
model_dir = snapshot_download("chendren/qwen2.5-3b-cx-lam")
# Now contains: config.json, adapter_config.json, adapters.safetensors, ...
# Or via the LAM package helper (guarantees tracking + convenient):
from lam.inference import ensure_lam_downloaded
lam_dir = ensure_lam_downloaded("chendren/qwen2.5-3b-cx-lam") # hits config.json
**Note:** Direct raw downloads or `git clone` may not increment the counter. Use `snapshot_download` / `huggingface-cli` for accurate tracking. The badge above reflects tracked downloads.
To bootstrap / force visible download stats (owner runs can be throttled), use the included helper:
```bash
python scripts/force_download_hits.py --hits 10
This repeatedly hits config.json (the primary query file) + other metadata via the official client.
For even better native support for all MLX models (including custom adapters), a small addition to the Hub's library registry is proposed (see mlx-library-registration.patch in this repo). This registers library_name: mlx with explicit countDownloads query on config + adapter files.
Load the model (MLX + LoRA)
from mlx_lm import load
from huggingface_hub import snapshot_download
model_dir = snapshot_download("chendren/qwen2.5-3b-cx-lam")
model, tokenizer = load(
"mlx-community/Qwen2.5-3B-Instruct-4bit",
adapter_path=f"{model_dir}/adapters.safetensors"
)
Usage (100% Real Browser + VLM)
from lam.inference import run_lam_session
sess = run_lam_session(
"Customer Jordan Lee asks about renewal and add-on pricing for the contract",
one_action_per_step=True,
max_steps=6,
use_vision=True,
use_memory=True,
hybrid_dialogue=True,
# hf_repo="chendren/qwen2.5-3b-cx-lam", # uncomment to auto-download via snapshot (ensures tracking via config.json)
)
print("Steps:", sess["num_steps"], "terminal:", sess["terminal"])
for s in sess["steps"]:
print(" Actions:", s["actions"])
print(" Visual (VLM):", s.get("visual_context", "")[:150])
if s.get("reasoning"):
print(" Reasoning:", s["reasoning"][:80])
CLI (real browser + VLM):
PYTHONPATH=. python3 scripts/lam_infer.py --adaptive --iterative --vision --memory --hybrid \
"Customer Jordan Lee (VIP) asks about renewal and add-on pricing"
To watch the browser:
... --no-headless
Server must be running (node server.js).
Sequence Diagram: One Full Adaptive Test (Priya Renewal Seed)
This is the exact flow for one representative test (Priya Patel renewal discussion + add-on pricing):
(The diagrams above show the detailed flow for one representative test run. Source: diagrams/one_test_sequence.mmd)
sequenceDiagram
autonumber
participant User as User/CLI
participant Infer as lam_infer.py
participant Sess as run_lam_session
participant Exec as RealBrowserExecutor
participant VLM as VLM (mlx-vlm)
participant Mem as EpisodeMemory
participant Pol as session_policy
participant Mod as Model (Qwen + LoRA)
participant FS as Filesystem (traces/)
User->>Infer: python scripts/lam_infer.py --adaptive --iterative --vision --memory --hybrid "Priya renewal..."
Infer->>Sess: run_lam_session(obs, use_vision=True, use_memory=True, ...)
Note over Sess,Exec: Preload (once per process)
Sess->>Exec: RealBrowserExecutor(reuse=True)
Sess->>VLM: preload model
Sess->>Mem: get_memory()
loop Until terminal (may_terminate)
Sess->>Exec: get_visual_context()
Exec->>Exec: page.screenshot()
Exec->>VLM: describe_screenshot(png)
VLM-->>Exec: VLM_REAL_IMAGE_DESC: CONTACTS: ... | UI_STATE: ...
Exec-->>Sess: visual_ctx + screenshot_path
Sess->>Mem: get_summary_for_prompt() + get_rich_examples_for_prompt()
Mem-->>Sess: mem_summary + "Past similar traces (use as guide...)"
Sess->>Sess: build_continuation_prompt(obs + visual + mem + reasoning + completed)
Sess->>Mod: generate_lam_action(continuation_prompt)
Mod-->>Sess: { "reasoning": "...", "execute": [ {tool: "crm.screenpop", ...} ] }
Sess->>Pol: may_terminate(steps, results)? or override early log_call
Pol-->>Sess: actions (or forced create_case)
Sess->>Exec: execute_sequence(actions)
Exec->>Exec: real Playwright (fill, click, wait)
Exec-->>Sess: results (ok, performed, etc.)
Sess->>Sess: _generate_dialogue(actions, results)
Sess->>Sess: append step_rec (visual, reasoning, dialogue, results)
Sess->>Mem: add_episode(..., trace=steps, extra={reasoning, visual})
Sess->>Pol: may_terminate(steps)?
alt yes
Sess->>Sess: break
end
Sess->>Sess: build_adaptation_observation (for next)
Sess->>Exec: get_visual_context() (fresh VLM)
Sess->>Mem: get_summary...
end
Sess->>Pol: count_final_state(steps)
Pol-->>Sess: {"cases_created": 1, "logs_created": 1}
Sess->>FS: write lam_trace_....json (full steps + VLM + reasoning)
Sess-->>Infer: sess dict
Infer->>Infer: print("=== LAM CLOSED-LOOP SESSION ===")
Infer->>Infer: for each step: print OBS, VISUAL(VLM), MEMORY, MODEL ACTIONS, DIALOGUE, EXEC RESULTS
Infer->>Infer: print Final state, FULL TRACE STORED, exit code: 0
Infer-->>User: terminal=True, cases=1, logs=1, trace path
Key real components exercised in this test:
- Real VLM on every step (screenshots + describe)
- Dense memory + rich trace examples injected
- Policy override / may_terminate
- Real Playwright execution + results feedback
- Full trace written with VLM + reasoning
- 100% success: 3 steps, 1 case + 1 log, terminal
Limitations
- Base model (3B) limits long-horizon planning on complex or novel cases.
- Requires local CRM server + Playwright + MLX models (VLM ~2-4GB, LAM).
- VLM descriptions can be concise; full richness comes from the loop + memory.
- Research artifact. Not production-ready without larger base model + more real traces.
Citation / Related
Trained on public CX data + self-generated real traces. See companion dataset.
Trained and evaluated entirely locally on Apple Silicon with MLX + real browser + VLM.
Key components: lam/inference.py:run_lam_session, lam/executor.py, lam/vision.py, lam/session_policy.py, scripts/capture_verif.sh
- Downloads last month
- 93
Quantized
