Qwen3.5-35B-A3B β€” VKAE Accelerated

Ready-to-run, VKAE-accelerated serving of Qwen3.5-35B-A3B (35B-parameter Mixture-of-Experts, ~3B active). Ships as a self-contained container β€” model weights and an optimized serving runtime in a single image β€” so anyone can reproduce the numbers on their own GPU.

VKAE (VIDRAFT Kernel Acceleration Engine) is VIDRAFT's proprietary inference-serving optimization. The acceleration recipe is withheld; only the reproducible results are published here.

Measured performance

NVIDIA B200, single GPU, FP8, same-harness before/after.

Metric Baseline VKAE Gain
Single-stream throughput 25.7 tok/s 601 tok/s 23.4Γ—
Peak aggregate (high concurrency) β€” ~10,516 tok/s β€”
Output quality reference preserved no degradation

Realistic varied-content single-stream throughput sits around ~455 tok/s. Accuracy is preserved end to end.

Quick start

docker pull vidraft/qwen35-vkae:601
docker run --gpus all -p 8000:8000 vidraft/qwen35-vkae:601

The container serves an OpenAI-compatible API on port 8000 β€” point any OpenAI client at http://localhost:8000/v1. A Blackwell (B200) or Hopper (H100/H200) class GPU is recommended.

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"qwen35-vkae","messages":[{"role":"user","content":"Hello!"}]}'

πŸ“¦ Ready-to-use files in this repo: Dockerfile, docker-compose.yml, run_docker.sh β€” pull-and-run, no build required.

Links

Base model & license

Base weights: Qwen/Qwen3.5-35B-A3B (Apache-2.0), unmodified. This card documents VIDRAFT's accelerated serving of the model; the base model itself is unchanged. The VKAE acceleration method is proprietary and is not distributed in source form.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for FINAL-Bench/Qwen3.5-35B-A3B-VKAE

Finetuned
(129)
this model

Space using FINAL-Bench/Qwen3.5-35B-A3B-VKAE 1

Collection including FINAL-Bench/Qwen3.5-35B-A3B-VKAE