Add paper DOI reference
Browse files
README.md
CHANGED
|
@@ -18,6 +18,8 @@ GGUF quantized versions of [GSAI-ML/LLaDA-8B-Instruct](https://huggingface.co/GS
|
|
| 18 |
|
| 19 |
LLaDA is a **diffusion language model** that generates text by iterative unmasking rather than autoregressive token-by-token prediction.
|
| 20 |
|
|
|
|
|
|
|
| 21 |
## Available Quantizations
|
| 22 |
|
| 23 |
| File | Quant | Size | Description |
|
|
@@ -66,9 +68,5 @@ cmake -B build -DCMAKE_BUILD_TYPE=Release
|
|
| 66 |
cmake --build build -j$(nproc)
|
| 67 |
|
| 68 |
# Generate with entropy_exit (recommended)
|
| 69 |
-
python tools/generate.py
|
| 70 |
-
--model-dir /path/to/LLaDA-8B-Instruct \
|
| 71 |
-
--gguf llada-8b-q4km.gguf \
|
| 72 |
-
-p "What is the capital of France?" \
|
| 73 |
-
-s 16 -t 12 --remasking entropy_exit
|
| 74 |
```
|
|
|
|
| 18 |
|
| 19 |
LLaDA is a **diffusion language model** that generates text by iterative unmasking rather than autoregressive token-by-token prediction.
|
| 20 |
|
| 21 |
+
> **Paper:** [Diffusion Language Models are Faster than Autoregressive on CPU](https://doi.org/10.5281/zenodo.19119814) -- C. Esteban, 2026
|
| 22 |
+
|
| 23 |
## Available Quantizations
|
| 24 |
|
| 25 |
| File | Quant | Size | Description |
|
|
|
|
| 68 |
cmake --build build -j$(nproc)
|
| 69 |
|
| 70 |
# Generate with entropy_exit (recommended)
|
| 71 |
+
python tools/generate.py --model-dir /path/to/LLaDA-8B-Instruct --gguf llada-8b-q4km.gguf -p "What is the capital of France?" -s 16 -t 12 --remasking entropy_exit
|
|
|
|
|
|
|
|
|
|
|
|
|
| 72 |
```
|