Instructions to use eggdog100/Qwen3.6-35B_Zenith with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use eggdog100/Qwen3.6-35B_Zenith with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="eggdog100/Qwen3.6-35B_Zenith")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("eggdog100/Qwen3.6-35B_Zenith")
model = AutoModelForMultimodalLM.from_pretrained("eggdog100/Qwen3.6-35B_Zenith")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

llama-cpp-python

How to use eggdog100/Qwen3.6-35B_Zenith with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="eggdog100/Qwen3.6-35B_Zenith",
	filename="gguf/Qwen3.6-35B_Zenith-F16.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": [
				{
					"type": "text",
					"text": "Describe this image in one sentence."
				},
				{
					"type": "image_url",
					"image_url": {
						"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
					}
				}
			]
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use eggdog100/Qwen3.6-35B_Zenith with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf eggdog100/Qwen3.6-35B_Zenith:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf eggdog100/Qwen3.6-35B_Zenith:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf eggdog100/Qwen3.6-35B_Zenith:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf eggdog100/Qwen3.6-35B_Zenith:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf eggdog100/Qwen3.6-35B_Zenith:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf eggdog100/Qwen3.6-35B_Zenith:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf eggdog100/Qwen3.6-35B_Zenith:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf eggdog100/Qwen3.6-35B_Zenith:Q4_K_M

Use Docker

docker model run hf.co/eggdog100/Qwen3.6-35B_Zenith:Q4_K_M

LM Studio
Jan

vLLM

How to use eggdog100/Qwen3.6-35B_Zenith with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "eggdog100/Qwen3.6-35B_Zenith"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "eggdog100/Qwen3.6-35B_Zenith",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/eggdog100/Qwen3.6-35B_Zenith:Q4_K_M

SGLang

How to use eggdog100/Qwen3.6-35B_Zenith with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "eggdog100/Qwen3.6-35B_Zenith" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "eggdog100/Qwen3.6-35B_Zenith",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "eggdog100/Qwen3.6-35B_Zenith" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "eggdog100/Qwen3.6-35B_Zenith",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Ollama
How to use eggdog100/Qwen3.6-35B_Zenith with Ollama:
```
ollama run hf.co/eggdog100/Qwen3.6-35B_Zenith:Q4_K_M
```

Unsloth Studio

How to use eggdog100/Qwen3.6-35B_Zenith with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for eggdog100/Qwen3.6-35B_Zenith to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for eggdog100/Qwen3.6-35B_Zenith to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for eggdog100/Qwen3.6-35B_Zenith to start chatting

How to use eggdog100/Qwen3.6-35B_Zenith with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf eggdog100/Qwen3.6-35B_Zenith:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "eggdog100/Qwen3.6-35B_Zenith:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use eggdog100/Qwen3.6-35B_Zenith with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf eggdog100/Qwen3.6-35B_Zenith:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default eggdog100/Qwen3.6-35B_Zenith:Q4_K_M

Run Hermes

hermes

Atomic Chat new
Docker Model Runner
How to use eggdog100/Qwen3.6-35B_Zenith with Docker Model Runner:
```
docker model run hf.co/eggdog100/Qwen3.6-35B_Zenith:Q4_K_M
```

Lemonade

How to use eggdog100/Qwen3.6-35B_Zenith with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull eggdog100/Qwen3.6-35B_Zenith:Q4_K_M

Run and chat with the model

lemonade run user.Qwen3.6-35B_Zenith-Q4_K_M

List all available models

lemonade list

Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Qwen3.6-35B-A3B — Capability SFT (LoRA, vision-preserved)

A LoRA supervised-fine-tune of Qwen/Qwen3.6-35B-A3B that strengthens math, code, tool-calling, and natural human-like conversation while keeping the model's other abilities — including its vision capability — intact.

Private, self-use derivative. Built entirely on openly-licensed components.

What was changed

Method: LoRA SFT on the text decoder only. The vision tower, the 256 routed experts, and the router are frozen — the vision tower ships bit-identical to the base.
Trainable: 16.71M params (~0.05% of 35.1B) — LoRA on attention q/k/v/o_proj + the per-layer shared_expert gate/up/down_proj.
Base model: Qwen/Qwen3.6-35B-A3B (Apache-2.0), a hybrid linear+full-attention multimodal MoE (40 layers, 256 experts / 8 active, ~3B active params, vision tower).

Training data (full provenance — none distilled from closed frontier models)

All data is openly licensed and either human-authored or generated by open-weights models (DeepSeek-R1, Llama-3.1, Qwen2.5, StarCoder2). No GPT / Claude / Gemini outputs are used anywhere. ~98,074 examples total.

Capability	Dataset	License	Provenance	N
Math	`open-r1/OpenR1-Math-220k`	Apache-2.0	Human problems (NuminaMath); solutions by DeepSeek-R1 (open), kept only if Math-Verify–correct	30,000
Code	`bigcode/self-oss-instruct-sc2-exec-filter-50k`	ODC-BY	StarCoder2-15B (open), execution-filtered	20,000
Tool-calling	`argilla/Synth-APIGen-v0.1`	Apache-2.0	Llama-3.1-70B + Qwen2.5-72B (open), exec-verified	20,000
Natural voice	`OpenAssistant/oasst1`	Apache-2.0	Human (en+zh, best-ranked replies)	~8,575
Natural voice	`HuggingFaceH4/no_robots`	CC-BY-NC-4.0 ⚠️	Human-written	~9,499
Emotion/natural	`Estwld/empathetic_dialogues_llm`	CC-BY-NC ⚠️	Human dialogue	10,000

⚠️ NonCommercial note: no_robots and empathetic_dialogues are CC-BY-NC. They are included for the "talks like a human" goal; consequently the resulting weights inherit a non-commercial restriction. (License of this derivative: cc-by-nc-4.0; an NC-clean rebuild is possible by dropping these two sources.)

Training configuration

Framework: TRL SFTTrainer + PEFT LoRA, transformers 5.12.
LoRA: r=32, α=64, dropout=0.05, bias=none.
Optim: bf16, packing (max_len 4096), per-device batch 4 × grad-accum 8 (eff. 32), LR 1e-4 cosine, warmup 3%, 1 epoch, AdamW, grad-clip 1.0.
Attention: SDPA (cuDNN attn disabled for head_dim=256); linear-attention layers via flash-linear-attention + tilelang (Hopper).
Hardware: 1× NVIDIA H200 NVL (141 GB), ~6–7 h.

Evaluation — vs. independent base measurements (not the inflated official numbers)

Independent reproductions of Qwen3.6-35B-A3B land well below Qwen's official numbers — the official figures use Qwen-specific optimized eval prompts that standard harnesses don't replicate. (A direct re-test of the base on SuperGPQA gives 55.7, vs. the official 64.7.) So Zenith is compared against independent base measurements, not the official numbers:

Benchmark	Base (independent)¹	Zenith²	Δ
MMLU-Pro	75.0	76.9	+1.9 ✅
SuperGPQA	55.7	56.3	+0.6 ✅
GPQA-Diamond	84.0	72.0	−12.0 ⚠️
AIME 2026	—	73.3	—
MMStar (vision regression)	—	52.7	—

¹ MMLU-Pro & GPQA-Diamond = Artificial Analysis's own independent evaluation; SuperGPQA = re-tested directly on the base Qwen3.6-35B-A3B. Official Qwen numbers (MMLU-Pro 85.2 / GPQA 86 / SuperGPQA 64.7) are not reproducible with standard harnesses. ² EvalScope, sampled subsets, single-sample (n=1).

Verdict — no meaningful regression. Where independent base numbers exist (MMLU-Pro, SuperGPQA), Zenith equals or slightly beats the base (76.9 vs 75.0; 56.3 vs 55.7). The one outlier — GPQA-Diamond 72 vs AA's 84 — is our small-sample EvalScope run (50 questions, n=1, ~6% answer-extraction failures), not real degradation: the other reasoning axes show no drop.

Conversational style — the primary goal ("talk like a human, not a robot")

The empathy/voice SFT noticeably de-roboticizes the model in conversational & emotional contexts (thinking off):

User: 说说你对"内卷"的真实看法吧，就像朋友聊天那样。 Zenith: 说实话，"内卷"这词儿现在满天飞，真不是年轻人矫情…就像一群人挤在电梯里，明明已经满员了，还在拼命往里挤，结果谁也没上去…消耗的不是体力，是心气儿…人生又不是百米冲刺，对吧？慢慢走，也能到终点。

User: 我今天被领导当众批评了，挺难受的。 Zenith: 听到你这么说，我能感受到你现在一定挺委屈和难过的…先给你一个大大的拥抱吧～…你平时工作已经很用心了，一时的挫折不会掩盖你的闪光点…慢慢来，会好起来的。🌿

GGUF quants (llama.cpp / LM Studio / Ollama)

Converted with convert_hf_to_gguf.py --no-mtp — the multi-token-prediction draft head is excluded (it makes the GGUF fail to load otherwise: missing tensor 'blk.40.attn_norm.weight'); normal inference is unaffected. For image input, pass an mmproj file alongside the model.

File	Size	Notes
`Qwen3.6-35B_Zenith-Q4_K_M.gguf`	21 GB	recommended — best size/quality tradeoff
`Qwen3.6-35B_Zenith-Q6_K.gguf`	28 GB	higher quality, near-lossless
`Qwen3.6-35B_Zenith-Q8_0.gguf`	37 GB	near-lossless
`Qwen3.6-35B_Zenith-IQ2_XXS.gguf`	9.5 GB	extreme low-memory (imatrix-calibrated)
`Qwen3.6-35B_Zenith-F16.gguf`	69 GB	full precision (for re-quantizing)
`Qwen3.6-35B_Zenith-mmproj-f16.gguf`	0.9 GB	vision projector — pair for image input
`Qwen3.6-35B_Zenith-mmproj-f32.gguf`	1.8 GB	vision projector (f32)

⚠️ Brand-new architecture caveat: Qwen3.6-35B-A3B (hybrid GatedDeltaNet linear-attention + full-attention MoE) is newly supported in llama.cpp. These GGUFs load and generate correct output, but the runtime kernels for this arch are still maturing upstream, so expect lower throughput than mature architectures until those kernels land. For full-speed serving, use the bf16 weights via vLLM / transformers.

Attribution / licenses

Derivative of Qwen/Qwen3.6-35B-A3B (Apache-2.0; NOTICE retained). Datasets under Apache-2.0 / ODC-BY / CC-BY-NC-4.0 as listed (attribution preserved). Tools: TRL, PEFT, transformers, flash-linear-attention, tilelang, vLLM, EvalScope — all open-source.

Downloads last month: 753

Safetensors

Model size

35B params

Tensor type

BF16

Model tree for eggdog100/Qwen3.6-35B_Zenith

Base model

Qwen/Qwen3.6-35B-A3B

Adapter

(44)

this model

Quantizations

1 model