Kearm commited on
Commit
4b0731c
·
verified ·
1 Parent(s): 406634e

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +40 -0
README.md ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ - zh
5
+ library_name: transformers
6
+ license: mit
7
+ pipeline_tag: text-generation
8
+ base_model:
9
+ - zai-org/GLM-4.6
10
+ ---
11
+ # GLM-4.6-NVFP4
12
+
13
+ **Quantized version of [GLM-4.6](https://huggingface.co/zai-org/GLM-4.6)** using **LLM Compressor** and the **NVFP4** (E2M1 + E4M3) format.
14
+
15
+ **This time it actually works!** *We think*
16
+
17
+ This should be the start of a new series of *hopefully optimal* NVFP4 quantizations as capable cards continue to grow out in the wild.
18
+
19
+ ---
20
+
21
+ ## Model Summary
22
+
23
+ | Property | Value |
24
+ |-----------|--------|
25
+ | Base model | GLM-4.6 |
26
+ | Quantization | NVFP4 (FP4 microscaling, block = 16, scale = E4M3) |
27
+ | Method | Post-Training Quantization with ModelOpt |
28
+ | Toolchain | TensorRT-Model-Optimizer / ModelOpt for PyTorch |
29
+ | Hardware target | NVIDIA Blackwell / GB200 Tensor Cores |
30
+ | Precision | Weights & activations = FP4 • Scales = FP8 (E4M3) |
31
+ | Maintainer | **REMSP.DEV** |
32
+
33
+ ---
34
+
35
+ ## Description
36
+
37
+ This model is a drop-in replacement for GLM-4.6 that runs in **NVFP4 precision**, enabling up to **6× faster GEMM throughput** and around **65 % lower memory use** compared with BF16.
38
+ Accuracy remains within ≈ 1 % of the FP8 baseline on standard reasoning and coding benchmarks.
39
+
40
+ ---