--- base_model: - FuseAI/FuseO1-DeepSeekR1-QwQ-SkyT1-Flash-32B-Preview - Rombo-Org/Rombo-LLM-V3.1-QWQ-32b - FuseAI/FuseO1-DeepSeekR1-QwQ-SkyT1-32B-Preview - FuseAI/FuseO1-DeepSeekR1-Qwen2.5-Coder-32B-Preview library_name: transformers tags: - mergekit - merge --- # merge This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit). ## Merge Details ### Merge Method This model was merged using the [SCE](https://arxiv.org/abs/2408.07990) merge method using [Rombo-Org/Rombo-LLM-V3.1-QWQ-32b](https://huggingface.co/Rombo-Org/Rombo-LLM-V3.1-QWQ-32b) as a base. ### Models Merged The following models were included in the merge: * [FuseAI/FuseO1-DeepSeekR1-QwQ-SkyT1-Flash-32B-Preview](https://huggingface.co/FuseAI/FuseO1-DeepSeekR1-QwQ-SkyT1-Flash-32B-Preview) * [FuseAI/FuseO1-DeepSeekR1-QwQ-SkyT1-32B-Preview](https://huggingface.co/FuseAI/FuseO1-DeepSeekR1-QwQ-SkyT1-32B-Preview) * [FuseAI/FuseO1-DeepSeekR1-Qwen2.5-Coder-32B-Preview](https://huggingface.co/FuseAI/FuseO1-DeepSeekR1-Qwen2.5-Coder-32B-Preview) ### Configuration The following YAML configuration was used to produce this model: ```yaml models: - model: FuseAI/FuseO1-DeepSeekR1-QwQ-SkyT1-32B-Preview parameters: weight: 1.2 # Slightly favor density: 0.9 # Sparsified a bit to reduce noise - model: FuseAI/FuseO1-DeepSeekR1-Qwen2.5-Coder-32B-Preview parameters: weight: 1 density: 0.9 - model: FuseAI/FuseO1-DeepSeekR1-QwQ-SkyT1-Flash-32B-Preview parameters: weight: 1 density: 0.9 merge_method: sce # SCE for adaptive weighting base_model: Rombo-Org/Rombo-LLM-V3.1-QWQ-32b parameters: normalize: true int8_mask: true select_topk: 0.1 # Retain the top 10% high-variance elements tokenizer_source: union # Union to combine vocabularies dtype: bfloat16 ```