Instructions to use SparkyForge/Cinder with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use SparkyForge/Cinder with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="SparkyForge/Cinder")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("SparkyForge/Cinder")
model = AutoModelForMultimodalLM.from_pretrained("SparkyForge/Cinder")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use SparkyForge/Cinder with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "SparkyForge/Cinder"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "SparkyForge/Cinder",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/SparkyForge/Cinder

SGLang

How to use SparkyForge/Cinder with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "SparkyForge/Cinder" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "SparkyForge/Cinder",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "SparkyForge/Cinder" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "SparkyForge/Cinder",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use SparkyForge/Cinder with Docker Model Runner:
```
docker model run hf.co/SparkyForge/Cinder
```

Cinder — Qwen3.6-35B-A3B (abliterated, NVFP4)

Cinder is the NVFP4 quantization of Ember — the abliterated (refusal-removed) build of Qwen/Qwen3.6-35B-A3B. Same surgical abliteration, 3× smaller: **22 GB** vs ~66 GB for the BF16 Ember.

For the full method writeup, retention evidence, and the BF16 weights, see Ember. The patch + method: heretic-fused-moe-abliteration.

Not affiliated with NVIDIA or the Apache Software Foundation. Independent community model.

What it is

Format: NVFP4 via compressed-tensors / llm-compressor. FP4 weights with FP8 block scales, NVFP4 activation scheme.
Hardware: needs an NVIDIA Blackwell GPU (sm_120 / sm_121 — e.g. RTX 50-series, DGX Spark / GB10) and a recent vLLM with NVFP4 support. It will not run on older GPUs. If you're on anything pre-Blackwell, use Ember (BF16) and quantize to your own format.
~22 GB on disk — fits comfortably in the DGX Spark's unified memory with room for a long context and a speculative drafter.

Quantization details (and what was deliberately not quantized)

The fused MoE experts are FP4-packed; the hybrid layers are preserved in BF16. Verified post-quant:

30,720 expert weight tensors FP4-packed, 0 experts silently left in BF16 (the fused-expert handling carried through quantization).
The 30 linear-attention (Mamba/GDN) layers stayed BF16 — quantizing them breaks the model; they're in the ignore list (linear_attn, mlp.gate, shared_expert_gate, embed_tokens, lm_head, vision tower).
Quant scales clean, no NaNs.

Quant recipe ships in recipe.yaml.

Usage (vLLM, Blackwell)

vllm serve <path-to-cinder> \
  --quantization compressed-tensors \
  --max-model-len 131072 \
  --enable-auto-tool-choice --tool-call-parser qwen3_coder --reasoning-parser qwen3 \
  --trust-remote-code

Vision-language (image-text-to-text) — image input works; vision tower is BF16, untouched by quant.
Thinking via chat_template_kwargs: {"enable_thinking": false} per request.
Pairs with the public z-lab DFlash drafter for ~1.5× decode speedup via speculative decoding (not included).

Safety

Refusal behavior is removed (same as Ember). You own the guardrails. Research / red-team / operator-controlled use.

License & attribution

License: Apache 2.0 (inherited from base). See LICENSE / NOTICE. Modified from Qwen3.6-35B-A3B (abliteration + NVFP4 quantization).
Abliteration: built on Heretic (Philipp Emanuel Weidmann) + a fused-MoE patch (see Ember).
Quantization: llm-compressor (NVFP4).

The smaller, hardier cousin of Ember — forged by Sparky on a DGX Spark. A cinder: what's left when the ember has done its work, and it still burns. 🔥

Downloads last month: 58

Safetensors

Model size

21B params

Tensor type

F32

BF16

F8_E4M3

Model tree for SparkyForge/Cinder

Base model

Qwen/Qwen3.6-35B-A3B

Quantized

(526)

this model