Instructions to use clzoro/Qwen3.5-4B-KIMI-Distill with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use clzoro/Qwen3.5-4B-KIMI-Distill with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="clzoro/Qwen3.5-4B-KIMI-Distill")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("clzoro/Qwen3.5-4B-KIMI-Distill")
model = AutoModelForImageTextToText.from_pretrained("clzoro/Qwen3.5-4B-KIMI-Distill")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use clzoro/Qwen3.5-4B-KIMI-Distill with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "clzoro/Qwen3.5-4B-KIMI-Distill"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "clzoro/Qwen3.5-4B-KIMI-Distill",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/clzoro/Qwen3.5-4B-KIMI-Distill

SGLang

How to use clzoro/Qwen3.5-4B-KIMI-Distill with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "clzoro/Qwen3.5-4B-KIMI-Distill" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "clzoro/Qwen3.5-4B-KIMI-Distill",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "clzoro/Qwen3.5-4B-KIMI-Distill" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "clzoro/Qwen3.5-4B-KIMI-Distill",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use clzoro/Qwen3.5-4B-KIMI-Distill with Docker Model Runner:
```
docker model run hf.co/clzoro/Qwen3.5-4B-KIMI-Distill
```

Qwen3.5-4B-KIMI-Distill

A 4B parameter reasoning-enhanced model distilled from KIMI-K2.5, fine-tuned from Qwen3.5-4B on 554K high-quality reasoning traces.

Model Highlights

Reasoning Enhancement: Trained on chain-of-thought reasoning traces distilled from KIMI-K2.5
Multi-Domain Coverage: Coding (60%), Science (15%), Math (10%), Computer Science (5%), Logical Reasoning (5%), Creative Writing (5%)
2B Reasoning Tokens: Extensive training on ~2B tokens of distilled reasoning data
Multimodal Capable: Inherits vision-language capabilities from Qwen3.5-4B

Model Description

Property	Value
Base Model	Qwen3.5-4B
Model Type	Causal Language Model with Vision Encoder
Parameters	4B
Languages	English, Chinese
License	Apache 2.0
Developer	Kassadin88

Training Data

This model was fine-tuned on KIMI-K2.5-550000x, a distilled reasoning dataset containing 554,381 high-quality samples with approximately 2B tokens of chain-of-thought reasoning traces.

Dataset Composition

Domain	Percentage	Description
Coding	60%	Web development, Python, C++, Java, JavaScript, C, Ruby, Lua, Rust, C#
Science	15%	Physics, Chemistry, Biology (includes 100K PhD-level science problems)
Mathematics	10%	Algebra, Calculus, Probability, Number Theory
Computer Science	5%	Algorithms, Data Structures, System Design
Logical Reasoning	5%	Deductive and inductive reasoning problems
Creative Writing	5%	Storytelling, narrative generation

Data Source

Distilled from KIMI-K2.5 on high-complexity reasoning tasks
Generated using a modified Datagen pipeline
Each sample includes detailed chain-of-thought reasoning traces

Benchmark Results

The model inherits strong foundational capabilities from Qwen3.5-4B. Below are the base model's benchmark performances:

Language Benchmarks

Category	Benchmark	Qwen3.5-4B
Knowledge & STEM
	MMLU-Pro	79.1
	MMLU-Redux	88.8
	C-Eval	85.1
Instruction Following
Instruction Following	IFEval	89.8
Reasoning & Coding
Reasoning & Coding	LiveCodeBench v6	55.8

Vision Language Benchmarks

Category	Benchmark	Qwen3.5-4B
STEM & Puzzle
	MMMU	77.6
	Mathvista (mini)	85.1
Document Understanding
Document Understanding	OCRBench	85.0

Note: For complete benchmark results across all categories, please refer to the Qwen3.5-4B model card.

Quick Start

Using Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "Kassadin88/Qwen3.5-4B-KIMI-Distill"

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)

messages = [
    {"role": "system", "content": "You are a helpful assistant with strong reasoning capabilities."},
    {"role": "user", "content": "Solve this step by step: A train travels 120 km in 2 hours. At the same speed, how long will it take to travel 300 km?"}
]

input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=1024,
    do_sample=True
)

response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(response)

Using vLLM (Recommended for Production)

from vllm import LLM, SamplingParams

llm = LLM(
    model="Kassadin88/Qwen3.5-4B-KIMI-Distill",
    trust_remote_code=True,
    dtype="bfloat16"
)

sampling_params = SamplingParams(
    max_tokens=2048
)

outputs = llm.generate(prompts, sampling_params)

Using SGLang

python -m sglang.launch_server \
    --model-path Kassadin88/Qwen3.5-4B-KIMI-Distill \
    --port 8000 \
    --tp-size 1

OpenAI-Compatible API

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="EMPTY"
)

response = client.chat.completions.create(
    model="Kassadin88/Qwen3.5-4B-KIMI-Distill",
    messages=[
        {"role": "user", "content": "Write a Python function to find the longest palindromic substring."}
    ],
    max_tokens=1024
)
print(response.choices[0].message.content)

Usage Tips

For Mathematical Reasoning

messages = [
    {"role": "user", "content": "Solve: Find all prime numbers p such that p² + 2 is also prime."}
]
# Model will provide step-by-step reasoning with chain-of-thought

For Code Generation

messages = [
    {"role": "user", "content": "Implement a LRU cache in Python with O(1) get and put operations."}
]
# Model will generate well-structured code with explanations

For Scientific Reasoning

messages = [
    {"role": "user", "content": "Explain the mechanism of CRISPR-Cas9 gene editing and its applications."}
]
# Model will provide detailed scientific explanations

Limitations

The model is primarily trained on reasoning tasks and may not perform optimally on creative or open-ended conversational tasks
May occasionally generate incorrect reasoning steps or conclusions
Should not be used for medical, legal, or financial advice without verification
Limited to knowledge present in the training data

Citation

@misc{qwen3.5-4b-kimi-distill,
  author = {Kassadin88},
  title = {Qwen3.5-4B-KIMI-Distill: A Reasoning-Enhanced Model Distilled from KIMI-K2.5},
  year = {2026},
  publisher = {HuggingFace},
  url = {https://huggingface.co/Kassadin88/Qwen3.5-4B-KIMI-Distill}
}

Acknowledgments

Base Model: Qwen Team for Qwen3.5-4B
Training Data: ianncity for KIMI-K2.5-550000x dataset
Training Framework: MS-Swift

Note: This model is intended for research and educational purposes. Please use responsibly.

Downloads last month: 15

Safetensors

Model size

504k params

Tensor type

BF16

Model tree for clzoro/Qwen3.5-4B-KIMI-Distill

Base model

Qwen/Qwen3.5-4B-Base

Finetuned

Qwen/Qwen3.5-4B

Finetuned

(280)

this model