Strix‑Halo build of GLM‑5.2‑MXFP4 and simple benchmarks
Hello, I appreciate you making GLM 5.2 MXFP4 available. Would it be possible for you to share a build or instructions that are specific to Strix Halo, along with some benchmark numbers so that others can replicate the results? I would like to run this on Strix Halo. Since most of us use this machine as a workstation in addition to an inference hardware, a version that can also run in Windows or WSL would be greatly appreciated.
Some of the things that would be most helpful
Install or build: pre-made artifacts or a brief script that demonstrates the precise Strix-Halo conversion and packaging procedures.
Runtime specifics: versions of the operating system, drivers, runtime, compiler, and libraries used.
Quantization: the precise flags or commands you used and the quant modes you tested.
Notes on compatibility: any necessary model modifications, operator adjustments, or unique runtime flags.
Benchmarks for prolonged runs, if at all possible, include power draw or thermal notes.
Precise commands: any environment variables in addition to the commands you used for loading, warmup, and timed runs.
NOTE: Doing so will simplify my decision to purchase more of this machine, and I’m sure I’m not the only one.