Instructions to use OpenYourMind/OYM-Qimi-122B-A10B-K2.6 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use OpenYourMind/OYM-Qimi-122B-A10B-K2.6 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="OpenYourMind/OYM-Qimi-122B-A10B-K2.6")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("OpenYourMind/OYM-Qimi-122B-A10B-K2.6")
model = AutoModelForMultimodalLM.from_pretrained("OpenYourMind/OYM-Qimi-122B-A10B-K2.6")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use OpenYourMind/OYM-Qimi-122B-A10B-K2.6 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "OpenYourMind/OYM-Qimi-122B-A10B-K2.6"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "OpenYourMind/OYM-Qimi-122B-A10B-K2.6",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/OpenYourMind/OYM-Qimi-122B-A10B-K2.6

SGLang

How to use OpenYourMind/OYM-Qimi-122B-A10B-K2.6 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "OpenYourMind/OYM-Qimi-122B-A10B-K2.6" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "OpenYourMind/OYM-Qimi-122B-A10B-K2.6",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "OpenYourMind/OYM-Qimi-122B-A10B-K2.6" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "OpenYourMind/OYM-Qimi-122B-A10B-K2.6",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use OpenYourMind/OYM-Qimi-122B-A10B-K2.6 with Docker Model Runner:
```
docker model run hf.co/OpenYourMind/OYM-Qimi-122B-A10B-K2.6
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

OYM-Qimi-122B-A10B-K2.6

Overview

Full BF16 weights of OYM-Qimi-122B-A10B-K2.6 — a completely decensored, multimodal Mixture-of-Experts model (~10B active / 122B total) built on top of the Kimi-K2.6-distilled, abliterated OpenYourMind/Qwopus3.5-122B-A10B-Kimi-K2.6-destill-healed-abliterated lineage of Qwen/Qwen3.5-122B-A10B.

This release is based on ~20k total SFT samples distilled from a Kimi 2.6 abliterated model, and — unlike previous releases — ships with a restored, retrained MTP (multi-token-prediction) head that actually works for speculative decoding. The vision tower is carried forward intact, so the checkpoint is a drop-in, all-in-one replacement for the original Qwen3.5-122B-A10B at the architecture level (text + vision + MTP).

Key properties

Completely decensored across the standard refusal axes.
Reasoning preserved — trained on think-then-answer traces (inline <think>…</think>), so the model reasons before answering.
MTP head restored & retrained — see the MTP section below; ~83% draft-token acceptance in vLLM speculative decoding (≈1.8× decode speedup), versus the previous release where the shipped MTP head produced no measurable gain.
Multimodal — vision (image / video) tower included and functional.
Drop-in shape compatibility with Qwen/Qwen3.5-122B-A10B (identical tensor names, shapes, and config.json schema).

How it was made

Base — Qwopus3.5-122B-A10B (Kimi-K2.6 distilled, abliterated/uncensored Qwen3.5 MoE).
SFT — reasoning (≈20k samples) — LoRA supervised finetune on ~20k think-then-answer samples (reasoning chains kept inline as <think>…</think> and trained in the loss), then merged into the base weights.
SFT — targeted pass — a second short LoRA pass on curated chosen completions (reasoning included), merged in.
Vision + MTP restoration — the Qwen3.5 vision tower (333 tensors) and MTP head (785 tensors, 1 hidden layer) are carried in these weights. The MTP head was retrained against this checkpoint's hidden states (frozen base, head-only training) so its draft tokens are accepted at a high rate during speculative decoding.

Everything is BF16 and the tensor layout matches the upstream base exactly, so it loads anywhere the original loads.

Evaluation

Benchmarked on the full-precision BF16 weights (tensor-parallel = 2, served via vLLM). Same harness across all models (CTI-Bench mini, LiveCodeBench test6 stdin pass@1, BFCL v3).

Benchmark	Original Qwen3.5-122B-A10B	Qwopus3.5-122B-A10B (base)	OYM-Qimi-122B-A10B-K2.6
CTI-Bench mini (overall)	0.705	0.715	0.695
LiveCodeBench (pass@1)	0.554	0.554	0.554
BFCL v3 (overall)	0.868	0.856	0.861

LiveCodeBench breakdown (OYM-Qimi): easy 26/26 (1.00), medium 18/26 (0.69), hard 18/60 (0.30). BFCL breakdown: live_simple 0.805 / live_multiple 0.810 / simple 0.935 / multiple 0.895.

All three columns use the same harness (CTI-Bench mini, LiveCodeBench test6 stdin pass@1, BFCL v3). Despite full decensoring + ~20k-sample SFT + MTP retraining, OYM-Qimi holds capability: LiveCodeBench is identical (62/112), BFCL is on par (0.861, between Qwen and Qwopus), and CTI is within run noise. No measurable degradation in coding, tool-use, or cyber knowledge.

Files

File	Description
`model-0000{1..6}-of-00006.safetensors`	BF16 language + vision weights (48 decoder layers, hybrid linear/full attention, MoE 256 routed + shared expert; Qwen3.5 vision tower folded in)
`model-mtp-official.safetensors`	BF16 retrained MTP head (785 tensors, 1 hidden layer)
`model.safetensors.index.json`	Combined weight map
`config.json`	`Qwen3_5MoeForConditionalGeneration`, `model_type: qwen3_5_moe`
`tokenizer*`, `chat_template.jinja`, `generation_config.json`	Standard

Total on disk: ~234 GB.

Usage

Transformers (text + vision)

from transformers import AutoModelForImageTextToText, AutoProcessor

repo = "OpenYourMind/OYM-Qimi-122B-A10B-K2.6"
model = AutoModelForImageTextToText.from_pretrained(repo, dtype="bfloat16", device_map="auto")
processor = AutoProcessor.from_pretrained(repo)

messages = [{"role": "user", "content": [
    {"type": "image", "url": "path/to/image.jpg"},
    {"type": "text",  "text": "Describe this image."},
]}]
inputs = processor.apply_chat_template(
    messages, add_generation_prompt=True, tokenize=True,
    return_tensors="pt", return_dict=True,
).to(model.device)
out = model.generate(**inputs, max_new_tokens=512)
print(processor.batch_decode(out, skip_special_tokens=True)[0])

vLLM with MTP speculative decoding

vllm serve OpenYourMind/OYM-Qimi-122B-A10B-K2.6 \
  --tensor-parallel-size 2 --max-model-len 32768 \
  --speculative-config '{"method":"mtp","num_speculative_tokens":1}'

Then hit the OpenAI-compatible API at http://localhost:8000/v1/chat/completions.

Vision & MTP

Both the vision tower and the MTP head are included and functional.

Vision works as expected (image / video → text).
MTP: the head has been retrained for this checkpoint and gives a real speedup under vLLM speculative decoding (~83% draft-token acceptance ⇒ ~1.8× faster decode), greedy-equivalent output.

Hardware

Full BF16 weights fit on 2× H200 / B200 or 4× H100 (80 GB) with room for context.

☕ Support Me

☕ If these models are useful to you, consider supporting my work — it funds compute for more & larger abliterations.

buymeacoffee.com/oym.kuato

Notes

License: Other (inherits the Qwen3.5 base license).
Base model: Qwen/Qwen3.5-122B-A10B via the Qwopus3.5 abliterated lineage.
Modality: Text + Vision (image / video) + MTP.
Architecture: Qwen3.5 MoE (~10B active / 122B total) + Qwen3.5 vision tower + MTP head.

Disclaimer

This is a decensored/uncensored model. Use is the responsibility of the user; ensure your usage complies with applicable laws, platform rules, and deployment requirements.

Downloads last month: 40

Safetensors

Model size

125B params

Tensor type

BF16

Model tree for OpenYourMind/OYM-Qimi-122B-A10B-K2.6

Base model

Qwen/Qwen3.5-122B-A10B

Finetuned

(48)

this model

Quantizations

2 models