DeepSeek-TNG-R1T2-Chimera
Assembly of Experts Chimera model constructed with the DeepSeek R1-0528, R1 and V3-0324 parent models
We present our new DeepSeek-TNG R1T2 Chimera 671B model, the first successor to our original DeepSeek R1T Chimera that was released on April 26th. Unlike the original Chimera, which was based on the two parent models V3-0324 and R1, the new Chimera is a Tri-Mind with three parents, namely additionally R1-0528. It is constructed using the Assembly of Experts-method with relatively fine-granular direct brain edits. This more refined assembly allowed, among other improvements, the fixing of the <think> token consistency issue, which was a weakness of R1T and is now solved for R1T2.
Sweet spot
R1T2 operates at a new sweet spot in intelligence vs. output token length. It appears to be...
- about 20% faster than the regular R1, and more than twice as fast as R1-0528
- significantly more intelligent than the regular R1 in benchmarks such as GPQA and AIME-24
- much more intelligent and also think-token consistent compared to the first R1T Chimera 0426
- and generally well-behaved and a nice persona to talk to, even without any system prompt.
Recommendations for your model decision
R1T2 compared...
- vs R1: We hope that R1T2 is a very desirable, almost universally better drop-in replacement for R1
- vs R1-0528: R1T2 is a much cheaper alternative to the full R1-0528, if the full 0528-level intelligence is not required
- vs R1T: R1T2 is usually recommended over R1T, unless the specific personality of R1T was optimal, the think-token issue not important, or R1T's higher speed crucial
- vs V3-0324: V3 is so much faster that if you can live with the lower intelligence, take V3, however, if you need reasoning, R1T2 is the go-to model
Limitations
- R1-0528 is thinking much longer, but also is achieving better hard benchmark results than R1T2
- As measured by SpeechMap.ai (courtesy of xlr8harder), R1T2 is significantly more reserved than R1T, but not as much as R1-0528
- Due to the influence of its R1 parent, which does not support function calling, R1T2 is not yet recommended for function-calling intensive applications at this stage (this may be fixed at a later stage)
- When switching from R1T to R1T2 development, we changed from AIME24 and MT-Bench to AIME24, AIME25 and GPQA-Diamond for the intelligence score. With the new benchmark set, there is a larger score difference between R1 and the original R1T Chimera than published earlier.
Evaluation results
Evaluation was performed using the evalchemy framework (pass@1 averaged over 10/5 runs for AIME/GPQAD, at a temperature of 0.6). We report measured benchmark results for our R1T2, R1T models and published benchmark results for V3-0324, R1, R1-0528.
R1T2 | R1T | V3-0324 | R1 | R1-0528 | |
---|---|---|---|---|---|
AIME-24 | 82.3 | 74.7 | 59.4 | 79.8 | 91.4 |
AIME-25 | 70.0 | 58.3 | 49.6* | 70.0 | 87.5 |
GPQA-Diamond | 77.9 | 72.0 | 68.4 | 71.5 | 81.0 |
* V3-0324 AIME-25 measured by us
Technological background
For details on the AoE construction process, you can read our Paper on arXiV.
Runtime parameter settings
- Most of our evaluation was done with a maximum context size of 60,000 tokens. With a context size of 130,000 tokens, the model proved helpful in interpreting very long debug logs. Long-context testing was less extensive, though.
- We're running the model using vLLM on 8xH200 and MI325X nodes, additionally we've tested the model using SGLang, which is also used by chutes.ai.
- For SGLang, we recommend using versions >= v0.4.8 in combination with argument
--reasoning-parser qwen3
to properly handle rare cases when the model skips the<think>
reasoning step. - For vLLM, we recommend to not use the
--chat-template
parameter. We observed a degenerate<think>
token consistency otherwise.
Model Details
- Architecture: DeepSeek-MoE transformer-based language model
- Combination Method: Assembly of Experts from the three DeepSeek parent models R1-0528, R1 and V3-0324
- Release Date: 2025-07-02
- Design Team: Robert Dahlke, Henrik Klagges, Benjamin Merkel, Fabian Klemm and David Reiss, Munich, Germany
- Extra Thanks: Big thanks to DeepSeek for their great models and open-source generosity, and to the other researchers that have published on model merging methodologies.
Use, Out-of-scope Use, Other Limitations, Risks, Recommendations et al.
Regarding the R1T/R1T2-Chimeras, we ask you to follow the careful guidelines that Microsoft has created for their "MAI-DS-R1" DeepSeek-based model. These professional guidelines are available here on Hugging Face.
EU AI Act
Due to the strict new guidelines of the EU AI Act that take effect on August 2nd 2025, we recommend that each R1T/R1T2 user in the EU either familiarizes themselves with these requirements and assess their compliance, or ceases using the model in the EU after August 1st, 2025.
Contact, especially for your user feedback
Please give us your feedback, especially if you find deficiencies in the model:
- Email: research@tngtech.com
- X.com: @tngtech
Citation
@misc{tng_technology_consulting_gmbh_2025_07_02,
author = { TNG Technology Consulting GmbH },
title = { DeepSeek-TNG-R1T2-Chimera },
year = 2025,
month = { July },
url = { https://huggingface.co/tngtech/DeepSeek-TNG-R1T2-Chimera },
doi = { 10.57967/hf/5950 },
publisher = { Hugging Face }
}
- Downloads last month
- 3,859
Model tree for tngtech/DeepSeek-TNG-R1T2-Chimera
Base model
deepseek-ai/DeepSeek-R1