OpenYourMind

OYM-Qimi-122B-A10B-K2.6

Overview

Full BF16 weights of OYM-Qimi-122B-A10B-K2.6 — a completely decensored, multimodal Mixture-of-Experts model (~10B active / 122B total) built on top of the Kimi-K2.6-distilled, abliterated OpenYourMind/Qwopus3.5-122B-A10B-Kimi-K2.6-destill-healed-abliterated lineage of Qwen/Qwen3.5-122B-A10B.

This release is based on ~20k total SFT samples distilled from a Kimi 2.6 abliterated model, and — unlike previous releases — ships with a restored, retrained MTP (multi-token-prediction) head that actually works for speculative decoding. The vision tower is carried forward intact, so the checkpoint is a drop-in, all-in-one replacement for the original Qwen3.5-122B-A10B at the architecture level (text + vision + MTP).

Key properties

  • Completely decensored across the standard refusal axes.
  • Reasoning preserved — trained on think-then-answer traces (inline <think>…</think>), so the model reasons before answering.
  • MTP head restored & retrained — see the MTP section below; ~83% draft-token acceptance in vLLM speculative decoding (≈1.8× decode speedup), versus the previous release where the shipped MTP head produced no measurable gain.
  • Multimodal — vision (image / video) tower included and functional.
  • Drop-in shape compatibility with Qwen/Qwen3.5-122B-A10B (identical tensor names, shapes, and config.json schema).

How it was made

  1. BaseQwopus3.5-122B-A10B (Kimi-K2.6 distilled, abliterated/uncensored Qwen3.5 MoE).
  2. SFT — reasoning (≈20k samples) — LoRA supervised finetune on ~20k think-then-answer samples (reasoning chains kept inline as <think>…</think> and trained in the loss), then merged into the base weights.
  3. SFT — targeted pass — a second short LoRA pass on curated chosen completions (reasoning included), merged in.
  4. Vision + MTP restoration — the Qwen3.5 vision tower (333 tensors) and MTP head (785 tensors, 1 hidden layer) are carried in these weights. The MTP head was retrained against this checkpoint's hidden states (frozen base, head-only training) so its draft tokens are accepted at a high rate during speculative decoding.

Everything is BF16 and the tensor layout matches the upstream base exactly, so it loads anywhere the original loads.

Evaluation

Benchmarked on the full-precision BF16 weights (tensor-parallel = 2, served via vLLM). Same harness across all models (CTI-Bench mini, LiveCodeBench test6 stdin pass@1, BFCL v3).

Benchmark Original Qwen3.5-122B-A10B Qwopus3.5-122B-A10B (base) OYM-Qimi-122B-A10B-K2.6
CTI-Bench mini (overall) 0.705 0.715 0.695
LiveCodeBench (pass@1) 0.554 0.554 0.554
BFCL v3 (overall) 0.868 0.856 0.861

LiveCodeBench breakdown (OYM-Qimi): easy 26/26 (1.00), medium 18/26 (0.69), hard 18/60 (0.30). BFCL breakdown: live_simple 0.805 / live_multiple 0.810 / simple 0.935 / multiple 0.895.

All three columns use the same harness (CTI-Bench mini, LiveCodeBench test6 stdin pass@1, BFCL v3). Despite full decensoring + ~20k-sample SFT + MTP retraining, OYM-Qimi holds capability: LiveCodeBench is identical (62/112), BFCL is on par (0.861, between Qwen and Qwopus), and CTI is within run noise. No measurable degradation in coding, tool-use, or cyber knowledge.

Files

File Description
model-0000{1..6}-of-00006.safetensors BF16 language + vision weights (48 decoder layers, hybrid linear/full attention, MoE 256 routed + shared expert; Qwen3.5 vision tower folded in)
model-mtp-official.safetensors BF16 retrained MTP head (785 tensors, 1 hidden layer)
model.safetensors.index.json Combined weight map
config.json Qwen3_5MoeForConditionalGeneration, model_type: qwen3_5_moe
tokenizer*, chat_template.jinja, generation_config.json Standard

Total on disk: ~234 GB.

Usage

Transformers (text + vision)

from transformers import AutoModelForImageTextToText, AutoProcessor

repo = "OpenYourMind/OYM-Qimi-122B-A10B-K2.6"
model = AutoModelForImageTextToText.from_pretrained(repo, dtype="bfloat16", device_map="auto")
processor = AutoProcessor.from_pretrained(repo)

messages = [{"role": "user", "content": [
    {"type": "image", "url": "path/to/image.jpg"},
    {"type": "text",  "text": "Describe this image."},
]}]
inputs = processor.apply_chat_template(
    messages, add_generation_prompt=True, tokenize=True,
    return_tensors="pt", return_dict=True,
).to(model.device)
out = model.generate(**inputs, max_new_tokens=512)
print(processor.batch_decode(out, skip_special_tokens=True)[0])

vLLM with MTP speculative decoding

vllm serve OpenYourMind/OYM-Qimi-122B-A10B-K2.6 \
  --tensor-parallel-size 2 --max-model-len 32768 \
  --speculative-config '{"method":"mtp","num_speculative_tokens":1}'

Then hit the OpenAI-compatible API at http://localhost:8000/v1/chat/completions.

Vision & MTP

Both the vision tower and the MTP head are included and functional.

  • Vision works as expected (image / video → text).
  • MTP: the head has been retrained for this checkpoint and gives a real speedup under vLLM speculative decoding (~83% draft-token acceptance ⇒ ~1.8× faster decode), greedy-equivalent output.

Hardware

Full BF16 weights fit on 2× H200 / B200 or 4× H100 (80 GB) with room for context.

☕ Support Me

☕ If these models are useful to you, consider supporting my work — it funds compute for more & larger abliterations.

Buy Me A Coffee

buymeacoffee.com/oym.kuato

Notes

  • License: Other (inherits the Qwen3.5 base license).
  • Base model: Qwen/Qwen3.5-122B-A10B via the Qwopus3.5 abliterated lineage.
  • Modality: Text + Vision (image / video) + MTP.
  • Architecture: Qwen3.5 MoE (~10B active / 122B total) + Qwen3.5 vision tower + MTP head.

Disclaimer

This is a decensored/uncensored model. Use is the responsibility of the user; ensure your usage complies with applicable laws, platform rules, and deployment requirements.

Downloads last month
40
Safetensors
Model size
125B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for OpenYourMind/OYM-Qimi-122B-A10B-K2.6

Finetuned
(48)
this model
Quantizations
2 models