Qwen3.5-35B-A3B — VKAE Accelerated

Ready-to-run, VKAE-accelerated serving of Qwen3.5-35B-A3B (35B-parameter Mixture-of-Experts, ~3B active). Ships as a self-contained container — model weights and an optimized serving runtime in a single image — so anyone can reproduce the numbers on their own GPU.

VKAE (VIDRAFT Kernel Acceleration Engine) is VIDRAFT's proprietary inference-serving optimization. The acceleration recipe is withheld; only the reproducible results are published here.

Measured performance

NVIDIA B200, single GPU, FP8, same-harness before/after.

Metric	Baseline	VKAE	Gain
Single-stream throughput	25.7 tok/s	601 tok/s	23.4×
Peak aggregate (high concurrency)	—	~10,516 tok/s	—
Output quality	reference	preserved	no degradation

Realistic varied-content single-stream throughput sits around ~455 tok/s. Accuracy is preserved end to end.

Quick start

docker pull vidraft/qwen35-vkae:601
docker run --gpus all -p 8000:8000 vidraft/qwen35-vkae:601

The container serves an OpenAI-compatible API on port 8000 — point any OpenAI client at http://localhost:8000/v1. A Blackwell (B200) or Hopper (H100/H200) class GPU is recommended.

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"qwen35-vkae","messages":[{"role":"user","content":"Hello!"}]}'

📦 Ready-to-use files in this repo: Dockerfile, docker-compose.yml, run_docker.sh — pull-and-run, no build required.

Base model & license

Base weights: Qwen/Qwen3.5-35B-A3B (Apache-2.0), unmodified. This card documents VIDRAFT's accelerated serving of the model; the base model itself is unchanged. The VKAE acceleration method is proprietary and is not distributed in source form.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for FINAL-Bench/Qwen3.5-35B-A3B-VKAE

Base model

Qwen/Qwen3.5-35B-A3B-Base

Finetuned

Qwen/Qwen3.5-35B-A3B

Finetuned

(129)

this model

Space using FINAL-Bench/Qwen3.5-35B-A3B-VKAE 1

Collection including FINAL-Bench/Qwen3.5-35B-A3B-VKAE

VKAE Accelerated

Collection

Fastest single-GPU serving of open models via VKAE. Live board: hf.co/spaces/VIDraft/vkae. Each = card + Docker. • 2 items • Updated 1 day ago • 12

FINAL-Bench
/

Qwen3.5-35B-A3B-VKAE