Carmenest commited on
Commit
927092d
·
verified ·
1 Parent(s): 3ed377e

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +48 -3
README.md CHANGED
@@ -1,3 +1,48 @@
1
- ---
2
- license: agpl-3.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ tags:
6
+ - diffusion
7
+ - llada
8
+ - gguf
9
+ - diffuse-cpp
10
+ base_model: GSAI-ML/LLaDA-8B-Instruct
11
+ quantized_by: Carmenest
12
+ pipeline_tag: text-generation
13
+ ---
14
+
15
+ # LLaDA-8B-Instruct GGUF
16
+
17
+ GGUF quantized versions of [GSAI-ML/LLaDA-8B-Instruct](https://huggingface.co/GSAI-ML/LLaDA-8B-Instruct) for use with [diffuse-cpp](https://github.com/iafiscal1212/diffuse-cpp).
18
+
19
+ LLaDA is a **diffusion language model** that generates text by iterative unmasking rather than autoregressive token-by-token prediction.
20
+
21
+ ## Available Quantizations
22
+
23
+ | File | Quant | Size | Description |
24
+ |------|-------|------|-------------|
25
+ | llada-8b-q4km.gguf | Q4_K_M | 5.1 GB | **Recommended** best throughput |
26
+ | llada-8b-q8_0.gguf | Q8_0 | 8.4 GB | High quality, good throughput |
27
+ | llada-8b-f16.gguf | F16 | 14.9 GB | Full precision reference |
28
+
29
+ ## Benchmark (24-core Xeon, 64 tokens)
30
+
31
+ | Model | Scheduler | tok/s | Speedup vs F16 |
32
+ |-------|-----------|-------|----------------|
33
+ | F16 | low_confidence | 1.64 | 1.00x |
34
+ | F16 | entropy_exit | 8.74 | 5.32x |
35
+ | Q8_0 | low_confidence | 1.86 | 1.13x |
36
+ | Q8_0 | entropy_exit | 10.09 | 6.14x |
37
+ | Q4_K_M | low_confidence | 2.48 | 1.51x |
38
+ | Q4_K_M | entropy_exit | **13.59** | **8.27x** |
39
+
40
+ **Q4_K_M + entropy_exit = 13.59 tok/s** (1.6x llama.cpp on same hardware)
41
+
42
+ ## Usage
43
+
44
+ ```bash
45
+ git clone https://github.com/iafiscal1212/diffuse-cpp
46
+ cd diffuse-cpp && mkdir build && cd build && cmake .. && make -j
47
+ ./diffuse-cli -m llada-8b-q4km.gguf -p "What is the capital of France?" -s 16 -t 12
48
+ ```