ZeroR3 commited on
Commit
5e74789
Β·
1 Parent(s): 337508d

docs: status section -> verified on real MI300X (256K, 31.31x concurrency, 95.26 GiB KV cache, 77.29 GiB weights)

Browse files
Files changed (1) hide show
  1. README.md +12 -2
README.md CHANGED
@@ -43,9 +43,19 @@ This is a memory-architecture story, not a CUDA-vs-ROCm one.
43
  - **Agent loop**: SC-TIR style (PLAN β†’ CALL TOOL β†’ OBSERVE β†’ THINK β†’ ANSWER)
44
  - **Tools**: `read_file` Β· `grep_codebase` Β· `execute_code` (sandboxed) Β· `run_tests` Β· `git_log`
45
 
46
- ## Status
47
 
48
- This Space runs on CPU-basic with the **mock LLM backend** for testing the agent loop without GPU credits. The `vllm` backend wires up automatically once the AMD MI300X endpoint comes online (AMD Cloud credits incoming).
 
 
 
 
 
 
 
 
 
 
49
 
50
  If the MI300X memory-architecture pitch resonates, **a like on this Space helps us with the Hugging Face Special Prize judging** πŸ€—
51
 
 
43
  - **Agent loop**: SC-TIR style (PLAN β†’ CALL TOOL β†’ OBSERVE β†’ THINK β†’ ANSWER)
44
  - **Tools**: `read_file` Β· `grep_codebase` Β· `execute_code` (sandboxed) Β· `run_tests` Β· `git_log`
45
 
46
+ ## Status β€” verified on real MI300X (2026-05-05)
47
 
48
+ Smoke test on a single AMD MI300X x1 (AMD Developer Cloud, $1.99/hr, vLLM 0.17.1 + ROCm 7.2 Quick Start image):
49
+
50
+ - βœ… Model weights in VRAM: **77.29 GiB**
51
+ - βœ… Available KV cache: **95.26 GiB**
52
+ - βœ… `--max-model-len 262144` (256K) β€” `Application startup complete`
53
+ - βœ… `/v1/models` returns `max_model_len: 262144`
54
+ - βœ… **31.31Γ— max concurrency at 256K context** β€” single MI300X serves ~31 simultaneous users at full 256K context
55
+ - βœ… Real Python code generation through `/v1/chat/completions` (merge sort / LCS / hello world)
56
+ - βœ… Cost of smoke test: ~$1.00 of $100 credits
57
+
58
+ This Space currently still runs on CPU-basic with the **mock LLM backend** because exposing a public API requires keeping a paid MI300X droplet up β€” final demo will be wired to a live MI300X endpoint during submission window.
59
 
60
  If the MI300X memory-architecture pitch resonates, **a like on this Space helps us with the Hugging Face Special Prize judging** πŸ€—
61