Gryphe's picture
Update README.md
a7e86b9 verified
metadata
license: apache-2.0
language:
  - en
base_model:
  - Qwen/Qwen3.6-35B-A3B
datasets:
  - Gryphe/Opus-4.6-Reasoning-24k
tags:
  - qwen3_6_moe
  - conversational
  - instruct
  - finetune
  - chatml
  - axolotl
  - roleplay
  - reasoning
  - creative-writing
pipeline_tag: text-generation

WorldSim-Opus-3.6-35B-A3B

image/jpg

An experiment in fusing creative world simulation and genuine reasoning capability into a single Qwen 3.6 MoE model.

The idea here was simple: find out whether a small reasoning model can roleplay properly if fed high quality data. Every dataset used here includes full thinking traces, so the model reasons its way through creative writing — planning story beats, considering character motivations, and working through consequences before committing to a response.

...Or so the theory goes!

GGUF quants are available here.

Model details

Base model is Qwen/Qwen3.6-35B-A3B - MoEs are neat little things, and this one is actually (finally!) remarkably easy to train with Axolotl. Worth noting: thinking traces are noticeably more concise post-training. Qwen3.6 loves to spiral, and apparently feeding it stories helped.

All three training sources are reasoning datasets, meaning every assistant turn includes a full thinking trace:

  • Opus-4.6-Reasoning-24k (50%) - a cleaned and deduplicated aggregation of Claude Opus 4.6 reasoning traces, covering general instruction-following, STEM, and coding domains
  • WorldSim data (40%) - long-form Opus 4.6 narrative roleplay with full reasoning traces, focusing on extended storytelling, character immersion, and emergent world logic, cobbled together through various experiments - mainly third person present tense but has a bit of everything + cliché cleaned, of course!
  • Tiamat data (10%) - character and roleplay dataset originally built for Tiamat-24B-Magistral, featuring a multi-step generation/extension/improvement pipeline with critic-improver rewrites to reduce AI clichés, with reasoning back-generated for each exchange

The model was trained with preserve_thinking: true, so thinking tags are active across all assistant turns in multi-turn conversations, not just the last one.

Inference

These settings have been working well for me:

"temperature": 0.8,
"repetition_penalty": 1.05,
"min_p": 0.05

I obviously recommend leaving thinking enabled, and ideally with preserve_thinking turned on.

Prompt Format

The model was trained using ChatML via Qwen3.6's chat template, which should be applied automatically.

Since reasoning doesn't tend to play nice with character name prefixes enabled I'm inclined to recommend against using them.

Notes

This is, like always, a research release and hasn't gone through extensive quality testing beyond basic sanity checks. The blend of reasoning + creative data is an experiment, and I'm genuinely not sure how well the two domains mix in practice. Let me know what you find! To me it feels absurdly promising, but I could be very wrong here, hence me sharing it with you all.

Credits

  • Everyone from Anthracite! Hi, guys! Still alive!
  • Latitude, who decided to take me on as a finetuner and gave me the chance to accumulate even more experience in this fascinating field
  • All the original dataset authors behind the Opus 4.6 reasoning data — full credits in the dataset card
  • All the folks I chat with on a daily basis on Discord! You know who you are.
  • Anyone I forgot to mention, just in case!