GFusion-10B-A1.8B-base

GFusion-10B-A1.8B-base is an experimental pretrained diffusion language model trained by adapting GigaChat3-10B-A1.8B-base to block diffusion generation.

GFusion uses a block size of 32 tokens and performs decoding with entropy-bounded sampling. In contrast to standard autoregressive generation, the model iteratively refines partially masked token blocks. This allows it to finalize multiple tokens in a single forward pass and provides a controllable trade-off between generation quality and decoding speed.

For architecture details, please refer to the GigaChat3-10B-A1.8B-base.

More details about the GFusion are available in the Habr article.

Important Note

This model card describes the base/pretrained model.

For dialogue tasks and instruction following, please use our instruct version.

Inference

We report decoding speed using TPF (tokens per forward pass): the average number of tokens finalized by the model in one forward pass.

Decoding algorithm Hyperparameter Math TPF Coding TPF Avg. TPF
Threshold-based τ = 0.85 ×2.2028 ×2.0780 ×2.1404
Threshold-based τ = 0.90 ×2.0033 ×1.8662 ×1.9348
Threshold-based τ = 0.95 ×1.7385 ×1.6235 ×1.6810
Entropy-bounded γ = 0.70 ×2.5786 ×2.3755 ×2.4771
Entropy-bounded γ = 0.35 ×2.1640 ×1.9817 ×2.0729
Entropy-bounded γ = 0.15 ×1.7993 ×1.6798 ×1.7396

Benchmarks

Benchmark GFusion-base
10B-A1.8B
GigaChat3-base
10B-A1.8B
LLaDA-MoE-base
7B-A1.4B
MMLU 71.73 71.20 64.59
MMLU-Pro 56.68 59.60 35.50
TruthfulQA 45.65 45.90 --
GSM8K 82.18 79.50 66.41
MGSM 82.00 82.40 --
MATH 24.04 23.10 --
MBPP 56.40 55.80 52.40
HumanEval 50.00 49.40 45.73

Quickstart

HF Transformers 🤗

from transformers import AutoTokenizer, AutoModelForCausalLM

device = "auto"
model_path = "ai-sage/GFusion-10B-A1.8B-base"

model = AutoModelForCausalLM.from_pretrained(
    model_path, device_map=device, trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(
    model_path, device_map=device, trust_remote_code=True
)

prompt = "Here are the KKT optimality conditions:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    block_size=32,
    gamma=0.70
)

print(tokenizer.decode(outputs[0]))

SGLang

GFusion support is available in SGLang PR #29776:

git clone https://github.com/sgl-project/sglang.git
cd sglang

git fetch origin refs/pull/29776/head:gfusion
git switch gfusion

python -m pip install --upgrade pip setuptools wheel
python -m pip install -e "python"

Create an EBSampling config file:

# eb_sampling.yaml
gamma: 0.15

Start the server with entropy-bounded sampling and FA3 attention:

python -m sglang.launch_server \
  --model-path ai-sage/GFusion-10B-A1.8B-base \
  --dllm-algorithm EBSampling \
  --dllm-algorithm-config eb_sampling.yaml \
  --attention-backend fa3 \
  --host 0.0.0.0 \
  --port 30000 \
  --dtype auto \
  --mem-fraction-static 0.88 \
  --cuda-graph-bs-decode 1

If FA3 is not available in your environment, use the Triton backend instead:

python -m sglang.launch_server \
  --model-path ai-sage/GFusion-10B-A1.8B-base \
  --dllm-algorithm EBSampling \
  --dllm-algorithm-config eb_sampling.yaml \
  --attention-backend triton \
  --host 0.0.0.0 \
  --port 30000 \
  --dtype auto \
  --mem-fraction-static 0.88

Example request for the base model:

curl http://localhost:30000/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ai-sage/GFusion-10B-A1.8B-base",
    "prompt": "Here are the KKT optimality conditions:",
    "max_tokens": 128,
    "temperature": 0
  }'
Downloads last month
-
Safetensors
Model size
11B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including ai-sage/GFusion-10B-A1.8B-base