Instructions to use eggdog100/Qwen3.6-35B_Zenith with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use eggdog100/Qwen3.6-35B_Zenith with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="eggdog100/Qwen3.6-35B_Zenith") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("eggdog100/Qwen3.6-35B_Zenith") model = AutoModelForMultimodalLM.from_pretrained("eggdog100/Qwen3.6-35B_Zenith") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - llama-cpp-python
How to use eggdog100/Qwen3.6-35B_Zenith with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="eggdog100/Qwen3.6-35B_Zenith", filename="gguf/Qwen3.6-35B_Zenith-F16.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use eggdog100/Qwen3.6-35B_Zenith with llama.cpp:
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh # Start a local OpenAI-compatible server with a web UI: llama serve -hf eggdog100/Qwen3.6-35B_Zenith:Q4_K_M # Run inference directly in the terminal: llama cli -hf eggdog100/Qwen3.6-35B_Zenith:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama serve -hf eggdog100/Qwen3.6-35B_Zenith:Q4_K_M # Run inference directly in the terminal: llama cli -hf eggdog100/Qwen3.6-35B_Zenith:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf eggdog100/Qwen3.6-35B_Zenith:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf eggdog100/Qwen3.6-35B_Zenith:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf eggdog100/Qwen3.6-35B_Zenith:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf eggdog100/Qwen3.6-35B_Zenith:Q4_K_M
Use Docker
docker model run hf.co/eggdog100/Qwen3.6-35B_Zenith:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use eggdog100/Qwen3.6-35B_Zenith with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "eggdog100/Qwen3.6-35B_Zenith" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "eggdog100/Qwen3.6-35B_Zenith", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/eggdog100/Qwen3.6-35B_Zenith:Q4_K_M
- SGLang
How to use eggdog100/Qwen3.6-35B_Zenith with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "eggdog100/Qwen3.6-35B_Zenith" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "eggdog100/Qwen3.6-35B_Zenith", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "eggdog100/Qwen3.6-35B_Zenith" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "eggdog100/Qwen3.6-35B_Zenith", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Ollama
How to use eggdog100/Qwen3.6-35B_Zenith with Ollama:
ollama run hf.co/eggdog100/Qwen3.6-35B_Zenith:Q4_K_M
- Unsloth Studio
How to use eggdog100/Qwen3.6-35B_Zenith with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for eggdog100/Qwen3.6-35B_Zenith to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for eggdog100/Qwen3.6-35B_Zenith to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for eggdog100/Qwen3.6-35B_Zenith to start chatting
- Pi
How to use eggdog100/Qwen3.6-35B_Zenith with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf eggdog100/Qwen3.6-35B_Zenith:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "eggdog100/Qwen3.6-35B_Zenith:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use eggdog100/Qwen3.6-35B_Zenith with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf eggdog100/Qwen3.6-35B_Zenith:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default eggdog100/Qwen3.6-35B_Zenith:Q4_K_M
Run Hermes
hermes
- Atomic Chat new
- Docker Model Runner
How to use eggdog100/Qwen3.6-35B_Zenith with Docker Model Runner:
docker model run hf.co/eggdog100/Qwen3.6-35B_Zenith:Q4_K_M
- Lemonade
How to use eggdog100/Qwen3.6-35B_Zenith with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull eggdog100/Qwen3.6-35B_Zenith:Q4_K_M
Run and chat with the model
lemonade run user.Qwen3.6-35B_Zenith-Q4_K_M
List all available models
lemonade list
- Qwen3.6-35B-A3B — Capability SFT (LoRA, vision-preserved)
- What was changed
- Training data (full provenance — none distilled from closed frontier models)
- Training configuration
- Evaluation — vs. independent base measurements (not the inflated official numbers)
- Conversational style — the primary goal ("talk like a human, not a robot")
- GGUF quants (llama.cpp / LM Studio / Ollama)
- Attribution / licenses
- What was changed
Qwen3.6-35B-A3B — Capability SFT (LoRA, vision-preserved)
A LoRA supervised-fine-tune of Qwen/Qwen3.6-35B-A3B that strengthens math, code, tool-calling, and natural human-like conversation while keeping the model's other abilities — including its vision capability — intact.
Private, self-use derivative. Built entirely on openly-licensed components.
What was changed
- Method: LoRA SFT on the text decoder only. The vision tower, the 256 routed experts, and the router are frozen — the vision tower ships bit-identical to the base.
- Trainable: 16.71M params (~0.05% of 35.1B) — LoRA on attention
q/k/v/o_proj+ the per-layershared_expertgate/up/down_proj. - Base model:
Qwen/Qwen3.6-35B-A3B(Apache-2.0), a hybrid linear+full-attention multimodal MoE (40 layers, 256 experts / 8 active, ~3B active params, vision tower).
Training data (full provenance — none distilled from closed frontier models)
All data is openly licensed and either human-authored or generated by open-weights models (DeepSeek-R1, Llama-3.1, Qwen2.5, StarCoder2). No GPT / Claude / Gemini outputs are used anywhere. ~98,074 examples total.
| Capability | Dataset | License | Provenance | N |
|---|---|---|---|---|
| Math | open-r1/OpenR1-Math-220k |
Apache-2.0 | Human problems (NuminaMath); solutions by DeepSeek-R1 (open), kept only if Math-Verify–correct | 30,000 |
| Code | bigcode/self-oss-instruct-sc2-exec-filter-50k |
ODC-BY | StarCoder2-15B (open), execution-filtered | 20,000 |
| Tool-calling | argilla/Synth-APIGen-v0.1 |
Apache-2.0 | Llama-3.1-70B + Qwen2.5-72B (open), exec-verified | 20,000 |
| Natural voice | OpenAssistant/oasst1 |
Apache-2.0 | Human (en+zh, best-ranked replies) | ~8,575 |
| Natural voice | HuggingFaceH4/no_robots |
CC-BY-NC-4.0 ⚠️ | Human-written | ~9,499 |
| Emotion/natural | Estwld/empathetic_dialogues_llm |
CC-BY-NC ⚠️ | Human dialogue | 10,000 |
⚠️ NonCommercial note: no_robots and empathetic_dialogues are CC-BY-NC. They are
included for the "talks like a human" goal; consequently the resulting weights inherit a
non-commercial restriction. (License of this derivative: cc-by-nc-4.0; an NC-clean rebuild is possible by dropping these two sources.)
Training configuration
- Framework: TRL
SFTTrainer+ PEFT LoRA,transformers5.12. - LoRA: r=32, α=64, dropout=0.05, bias=none.
- Optim: bf16, packing (max_len 4096), per-device batch 4 × grad-accum 8 (eff. 32), LR 1e-4 cosine, warmup 3%, 1 epoch, AdamW, grad-clip 1.0.
- Attention: SDPA (cuDNN attn disabled for head_dim=256); linear-attention layers via
flash-linear-attention+tilelang(Hopper). - Hardware: 1× NVIDIA H200 NVL (141 GB), ~6–7 h.
Evaluation — vs. independent base measurements (not the inflated official numbers)
Independent reproductions of Qwen3.6-35B-A3B land well below Qwen's official numbers — the official figures use Qwen-specific optimized eval prompts that standard harnesses don't replicate. (A direct re-test of the base on SuperGPQA gives 55.7, vs. the official 64.7.) So Zenith is compared against independent base measurements, not the official numbers:
| Benchmark | Base (independent)¹ | Zenith² | Δ |
|---|---|---|---|
| MMLU-Pro | 75.0 | 76.9 | +1.9 ✅ |
| SuperGPQA | 55.7 | 56.3 | +0.6 ✅ |
| GPQA-Diamond | 84.0 | 72.0 | −12.0 ⚠️ |
| AIME 2026 | — | 73.3 | — |
| MMStar (vision regression) | — | 52.7 | — |
¹ MMLU-Pro & GPQA-Diamond = Artificial Analysis's own independent evaluation; SuperGPQA = re-tested directly on the base Qwen3.6-35B-A3B. Official Qwen numbers (MMLU-Pro 85.2 / GPQA 86 / SuperGPQA 64.7) are not reproducible with standard harnesses. ² EvalScope, sampled subsets, single-sample (n=1).
Verdict — no meaningful regression. Where independent base numbers exist (MMLU-Pro, SuperGPQA), Zenith equals or slightly beats the base (76.9 vs 75.0; 56.3 vs 55.7). The one outlier — GPQA-Diamond 72 vs AA's 84 — is our small-sample EvalScope run (50 questions, n=1, ~6% answer-extraction failures), not real degradation: the other reasoning axes show no drop.
Conversational style — the primary goal ("talk like a human, not a robot")
The empathy/voice SFT noticeably de-roboticizes the model in conversational & emotional contexts (thinking off):
User: 说说你对"内卷"的真实看法吧,就像朋友聊天那样。 Zenith: 说实话,"内卷"这词儿现在满天飞,真不是年轻人矫情…就像一群人挤在电梯里,明明已经满员了,还在拼命往里挤,结果谁也没上去…消耗的不是体力,是心气儿…人生又不是百米冲刺,对吧?慢慢走,也能到终点。
User: 我今天被领导当众批评了,挺难受的。 Zenith: 听到你这么说,我能感受到你现在一定挺委屈和难过的…先给你一个大大的拥抱吧~…你平时工作已经很用心了,一时的挫折不会掩盖你的闪光点…慢慢来,会好起来的。🌿
GGUF quants (llama.cpp / LM Studio / Ollama)
Converted with convert_hf_to_gguf.py --no-mtp — the multi-token-prediction draft head is
excluded (it makes the GGUF fail to load otherwise: missing tensor 'blk.40.attn_norm.weight');
normal inference is unaffected. For image input, pass an mmproj file alongside the model.
| File | Size | Notes |
|---|---|---|
Qwen3.6-35B_Zenith-Q4_K_M.gguf |
21 GB | recommended — best size/quality tradeoff |
Qwen3.6-35B_Zenith-Q6_K.gguf |
28 GB | higher quality, near-lossless |
Qwen3.6-35B_Zenith-Q8_0.gguf |
37 GB | near-lossless |
Qwen3.6-35B_Zenith-IQ2_XXS.gguf |
9.5 GB | extreme low-memory (imatrix-calibrated) |
Qwen3.6-35B_Zenith-F16.gguf |
69 GB | full precision (for re-quantizing) |
Qwen3.6-35B_Zenith-mmproj-f16.gguf |
0.9 GB | vision projector — pair for image input |
Qwen3.6-35B_Zenith-mmproj-f32.gguf |
1.8 GB | vision projector (f32) |
⚠️ Brand-new architecture caveat: Qwen3.6-35B-A3B (hybrid GatedDeltaNet linear-attention + full-attention MoE) is newly supported in llama.cpp. These GGUFs load and generate correct output, but the runtime kernels for this arch are still maturing upstream, so expect lower throughput than mature architectures until those kernels land. For full-speed serving, use the bf16 weights via vLLM / transformers.
Attribution / licenses
Derivative of Qwen/Qwen3.6-35B-A3B (Apache-2.0; NOTICE retained). Datasets under
Apache-2.0 / ODC-BY / CC-BY-NC-4.0 as listed (attribution preserved). Tools: TRL, PEFT,
transformers, flash-linear-attention, tilelang, vLLM, EvalScope — all open-source.
- Downloads last month
- 753