Instructions to use ai-sage/GFusion-10B-A1.8B-base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ai-sage/GFusion-10B-A1.8B-base with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="ai-sage/GFusion-10B-A1.8B-base", trust_remote_code=True)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("ai-sage/GFusion-10B-A1.8B-base", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("ai-sage/GFusion-10B-A1.8B-base", trust_remote_code=True)

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use ai-sage/GFusion-10B-A1.8B-base with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ai-sage/GFusion-10B-A1.8B-base"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ai-sage/GFusion-10B-A1.8B-base",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/ai-sage/GFusion-10B-A1.8B-base

SGLang

How to use ai-sage/GFusion-10B-A1.8B-base with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "ai-sage/GFusion-10B-A1.8B-base" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ai-sage/GFusion-10B-A1.8B-base",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "ai-sage/GFusion-10B-A1.8B-base" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ai-sage/GFusion-10B-A1.8B-base",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use ai-sage/GFusion-10B-A1.8B-base with Docker Model Runner:
```
docker model run hf.co/ai-sage/GFusion-10B-A1.8B-base
```

GFusion-10B-A1.8B-base

GFusion-10B-A1.8B-base is an experimental pretrained diffusion language model trained by adapting GigaChat3-10B-A1.8B-base to block diffusion generation.

GFusion uses a block size of 32 tokens and performs decoding with entropy-bounded sampling. In contrast to standard autoregressive generation, the model iteratively refines partially masked token blocks. This allows it to finalize multiple tokens in a single forward pass and provides a controllable trade-off between generation quality and decoding speed.

For architecture details, please refer to the GigaChat3-10B-A1.8B-base.

More details about the GFusion are available in the Habr article.

Important Note

This model card describes the base/pretrained model.

For dialogue tasks and instruction following, please use our instruct version.

Inference

We report decoding speed using TPF (tokens per forward pass): the average number of tokens finalized by the model in one forward pass.

Decoding algorithm	Hyperparameter	Math TPF	Coding TPF	Avg. TPF
Threshold-based	`τ = 0.85`	×2.2028	×2.0780	×2.1404
Threshold-based	`τ = 0.90`	×2.0033	×1.8662	×1.9348
Threshold-based	`τ = 0.95`	×1.7385	×1.6235	×1.6810
Entropy-bounded	`γ = 0.70`	×2.5786	×2.3755	×2.4771
Entropy-bounded	`γ = 0.35`	×2.1640	×1.9817	×2.0729
Entropy-bounded	`γ = 0.15`	×1.7993	×1.6798	×1.7396

Benchmarks

Benchmark	GFusion-base 10B-A1.8B	GigaChat3-base 10B-A1.8B	LLaDA-MoE-base 7B-A1.4B
MMLU	71.73	71.20	64.59
MMLU-Pro	56.68	59.60	35.50
TruthfulQA	45.65	45.90	--
GSM8K	82.18	79.50	66.41
MGSM	82.00	82.40	--
MATH	24.04	23.10	--
MBPP	56.40	55.80	52.40
HumanEval	50.00	49.40	45.73

Quickstart

HF Transformers 🤗

from transformers import AutoTokenizer, AutoModelForCausalLM

device = "auto"
model_path = "ai-sage/GFusion-10B-A1.8B-base"

model = AutoModelForCausalLM.from_pretrained(
    model_path, device_map=device, trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(
    model_path, device_map=device, trust_remote_code=True
)

prompt = "Here are the KKT optimality conditions:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    block_size=32,
    gamma=0.70
)

print(tokenizer.decode(outputs[0]))

SGLang

GFusion support is available in SGLang PR #29776:

git clone https://github.com/sgl-project/sglang.git
cd sglang

git fetch origin refs/pull/29776/head:gfusion
git switch gfusion

python -m pip install --upgrade pip setuptools wheel
python -m pip install -e "python"

Create an EBSampling config file:

# eb_sampling.yaml
gamma: 0.15

Start the server with entropy-bounded sampling and FA3 attention:

python -m sglang.launch_server \
  --model-path ai-sage/GFusion-10B-A1.8B-base \
  --dllm-algorithm EBSampling \
  --dllm-algorithm-config eb_sampling.yaml \
  --attention-backend fa3 \
  --host 0.0.0.0 \
  --port 30000 \
  --dtype auto \
  --mem-fraction-static 0.88 \
  --cuda-graph-bs-decode 1

If FA3 is not available in your environment, use the Triton backend instead:

python -m sglang.launch_server \
  --model-path ai-sage/GFusion-10B-A1.8B-base \
  --dllm-algorithm EBSampling \
  --dllm-algorithm-config eb_sampling.yaml \
  --attention-backend triton \
  --host 0.0.0.0 \
  --port 30000 \
  --dtype auto \
  --mem-fraction-static 0.88

Example request for the base model:

curl http://localhost:30000/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ai-sage/GFusion-10B-A1.8B-base",
    "prompt": "Here are the KKT optimality conditions:",
    "max_tokens": 128,
    "temperature": 0
  }'

Downloads last month: -

Safetensors

Model size

11B params

Tensor type

BF16

Collection including ai-sage/GFusion-10B-A1.8B-base

GFusion

Collection

3 items • Updated 2 days ago • 2