diffuse-cpp
/

LLaDA-8B-Instruct-GGUF

Text Generation

masked-diffusion

Model card Files Files and versions

Carmenest commited on 6 days ago

Commit

0092801

·

verified ·

1 Parent(s): deb1e7f

Add paper DOI reference

Files changed (1) hide show

README.md +3 -5

README.md CHANGED Viewed

@@ -18,6 +18,8 @@ GGUF quantized versions of [GSAI-ML/LLaDA-8B-Instruct](https://huggingface.co/GS
 LLaDA is a **diffusion language model** that generates text by iterative unmasking rather than autoregressive token-by-token prediction.
 ## Available Quantizations
 | File | Quant | Size | Description |
@@ -66,9 +68,5 @@ cmake -B build -DCMAKE_BUILD_TYPE=Release
 cmake --build build -j$(nproc)
 # Generate with entropy_exit (recommended)
-python tools/generate.py \
-    --model-dir /path/to/LLaDA-8B-Instruct \
-    --gguf llada-8b-q4km.gguf \
-    -p "What is the capital of France?" \
-    -s 16 -t 12 --remasking entropy_exit
 ```

 LLaDA is a **diffusion language model** that generates text by iterative unmasking rather than autoregressive token-by-token prediction.
+> **Paper:** [Diffusion Language Models are Faster than Autoregressive on CPU](https://doi.org/10.5281/zenodo.19119814) -- C. Esteban, 2026
 ## Available Quantizations
 | File | Quant | Size | Description |
 cmake --build build -j$(nproc)
 # Generate with entropy_exit (recommended)
+python tools/generate.py     --model-dir /path/to/LLaDA-8B-Instruct     --gguf llada-8b-q4km.gguf     -p "What is the capital of France?"     -s 16 -t 12 --remasking entropy_exit
 ```