Instructions to use OpenYourMind/OYM-Qimi-122B-A10B-K2.6 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use OpenYourMind/OYM-Qimi-122B-A10B-K2.6 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="OpenYourMind/OYM-Qimi-122B-A10B-K2.6") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("OpenYourMind/OYM-Qimi-122B-A10B-K2.6") model = AutoModelForMultimodalLM.from_pretrained("OpenYourMind/OYM-Qimi-122B-A10B-K2.6") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use OpenYourMind/OYM-Qimi-122B-A10B-K2.6 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "OpenYourMind/OYM-Qimi-122B-A10B-K2.6" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "OpenYourMind/OYM-Qimi-122B-A10B-K2.6", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/OpenYourMind/OYM-Qimi-122B-A10B-K2.6
- SGLang
How to use OpenYourMind/OYM-Qimi-122B-A10B-K2.6 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "OpenYourMind/OYM-Qimi-122B-A10B-K2.6" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "OpenYourMind/OYM-Qimi-122B-A10B-K2.6", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "OpenYourMind/OYM-Qimi-122B-A10B-K2.6" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "OpenYourMind/OYM-Qimi-122B-A10B-K2.6", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use OpenYourMind/OYM-Qimi-122B-A10B-K2.6 with Docker Model Runner:
docker model run hf.co/OpenYourMind/OYM-Qimi-122B-A10B-K2.6
OYM-Qimi-122B-A10B-K2.6
Overview
Full BF16 weights of OYM-Qimi-122B-A10B-K2.6 — a completely decensored, multimodal Mixture-of-Experts model (~10B active / 122B total) built on top of the Kimi-K2.6-distilled, abliterated OpenYourMind/Qwopus3.5-122B-A10B-Kimi-K2.6-destill-healed-abliterated lineage of Qwen/Qwen3.5-122B-A10B.
This release is based on ~20k total SFT samples distilled from a Kimi 2.6 abliterated model, and — unlike previous releases — ships with a restored, retrained MTP (multi-token-prediction) head that actually works for speculative decoding. The vision tower is carried forward intact, so the checkpoint is a drop-in, all-in-one replacement for the original Qwen3.5-122B-A10B at the architecture level (text + vision + MTP).
Key properties
- Completely decensored across the standard refusal axes.
- Reasoning preserved — trained on think-then-answer traces (inline
<think>…</think>), so the model reasons before answering. - MTP head restored & retrained — see the MTP section below; ~83% draft-token acceptance in vLLM speculative decoding (≈1.8× decode speedup), versus the previous release where the shipped MTP head produced no measurable gain.
- Multimodal — vision (image / video) tower included and functional.
- Drop-in shape compatibility with
Qwen/Qwen3.5-122B-A10B(identical tensor names, shapes, andconfig.jsonschema).
How it was made
- Base —
Qwopus3.5-122B-A10B(Kimi-K2.6 distilled, abliterated/uncensored Qwen3.5 MoE). - SFT — reasoning (≈20k samples) — LoRA supervised finetune on ~20k think-then-answer samples (reasoning chains kept inline as
<think>…</think>and trained in the loss), then merged into the base weights. - SFT — targeted pass — a second short LoRA pass on curated chosen completions (reasoning included), merged in.
- Vision + MTP restoration — the Qwen3.5 vision tower (333 tensors) and MTP head (785 tensors, 1 hidden layer) are carried in these weights. The MTP head was retrained against this checkpoint's hidden states (frozen base, head-only training) so its draft tokens are accepted at a high rate during speculative decoding.
Everything is BF16 and the tensor layout matches the upstream base exactly, so it loads anywhere the original loads.
Evaluation
Benchmarked on the full-precision BF16 weights (tensor-parallel = 2, served via vLLM). Same harness across all models (CTI-Bench mini, LiveCodeBench test6 stdin pass@1, BFCL v3).
| Benchmark | Original Qwen3.5-122B-A10B | Qwopus3.5-122B-A10B (base) | OYM-Qimi-122B-A10B-K2.6 |
|---|---|---|---|
| CTI-Bench mini (overall) | 0.705 | 0.715 | 0.695 |
| LiveCodeBench (pass@1) | 0.554 | 0.554 | 0.554 |
| BFCL v3 (overall) | 0.868 | 0.856 | 0.861 |
LiveCodeBench breakdown (OYM-Qimi): easy 26/26 (1.00), medium 18/26 (0.69), hard 18/60 (0.30). BFCL breakdown: live_simple 0.805 / live_multiple 0.810 / simple 0.935 / multiple 0.895.
All three columns use the same harness (CTI-Bench mini, LiveCodeBench
test6stdin pass@1, BFCL v3). Despite full decensoring + ~20k-sample SFT + MTP retraining, OYM-Qimi holds capability: LiveCodeBench is identical (62/112), BFCL is on par (0.861, between Qwen and Qwopus), and CTI is within run noise. No measurable degradation in coding, tool-use, or cyber knowledge.
Files
| File | Description |
|---|---|
model-0000{1..6}-of-00006.safetensors |
BF16 language + vision weights (48 decoder layers, hybrid linear/full attention, MoE 256 routed + shared expert; Qwen3.5 vision tower folded in) |
model-mtp-official.safetensors |
BF16 retrained MTP head (785 tensors, 1 hidden layer) |
model.safetensors.index.json |
Combined weight map |
config.json |
Qwen3_5MoeForConditionalGeneration, model_type: qwen3_5_moe |
tokenizer*, chat_template.jinja, generation_config.json |
Standard |
Total on disk: ~234 GB.
Usage
Transformers (text + vision)
from transformers import AutoModelForImageTextToText, AutoProcessor
repo = "OpenYourMind/OYM-Qimi-122B-A10B-K2.6"
model = AutoModelForImageTextToText.from_pretrained(repo, dtype="bfloat16", device_map="auto")
processor = AutoProcessor.from_pretrained(repo)
messages = [{"role": "user", "content": [
{"type": "image", "url": "path/to/image.jpg"},
{"type": "text", "text": "Describe this image."},
]}]
inputs = processor.apply_chat_template(
messages, add_generation_prompt=True, tokenize=True,
return_tensors="pt", return_dict=True,
).to(model.device)
out = model.generate(**inputs, max_new_tokens=512)
print(processor.batch_decode(out, skip_special_tokens=True)[0])
vLLM with MTP speculative decoding
vllm serve OpenYourMind/OYM-Qimi-122B-A10B-K2.6 \
--tensor-parallel-size 2 --max-model-len 32768 \
--speculative-config '{"method":"mtp","num_speculative_tokens":1}'
Then hit the OpenAI-compatible API at http://localhost:8000/v1/chat/completions.
Vision & MTP
Both the vision tower and the MTP head are included and functional.
- Vision works as expected (image / video → text).
- MTP: the head has been retrained for this checkpoint and gives a real speedup under vLLM speculative decoding (~83% draft-token acceptance ⇒ ~1.8× faster decode), greedy-equivalent output.
Hardware
Full BF16 weights fit on 2× H200 / B200 or 4× H100 (80 GB) with room for context.
☕ Support Me
☕ If these models are useful to you, consider supporting my work — it funds compute for more & larger abliterations.
Notes
- License: Other (inherits the Qwen3.5 base license).
- Base model: Qwen/Qwen3.5-122B-A10B via the Qwopus3.5 abliterated lineage.
- Modality: Text + Vision (image / video) + MTP.
- Architecture: Qwen3.5 MoE (~10B active / 122B total) + Qwen3.5 vision tower + MTP head.
Disclaimer
This is a decensored/uncensored model. Use is the responsibility of the user; ensure your usage complies with applicable laws, platform rules, and deployment requirements.
- Downloads last month
- 40