metadata
base_model:
- sometimesanotion/Qwenvergence-14B-v3-Prose
- djuna/Q2.5-Veltha-14B-0.5
- CultriX/Qwen2.5-14B-Wernickev3
- CultriX/Qwen2.5-14B-Brocav7
- CultriX/Qwen2.5-14B-Brocav9
- CultriX/Qwen2.5-14B-partialmergept1
- CultriX/Qwen2.5-14B-Broca
- CultriX/Qwen2.5-14B-Unity
- CultriX/Qwenfinity-2.5-14B
- allknowingroger/QwenSlerp6-14B
library_name: transformers
tags:
- mergekit
- merge
Qwen2.5-14B-MetaMergev2
This is a merge of pre-trained language models created using mergekit.
Merge Details
Merge Method
This model was merged using the DARE TIES merge method using CultriX/Qwen2.5-14B-Brocav7 as a base.
Models Merged
The following models were included in the merge:
- sometimesanotion/Qwenvergence-14B-v3-Prose
- djuna/Q2.5-Veltha-14B-0.5
- CultriX/Qwen2.5-14B-Wernickev3
- CultriX/Qwen2.5-14B-Brocav9
- CultriX/Qwen2.5-14B-partialmergept1
- CultriX/Qwen2.5-14B-Broca
- CultriX/Qwen2.5-14B-Unity
- CultriX/Qwenfinity-2.5-14B
- allknowingroger/QwenSlerp6-14B
Configuration
The following YAML configuration was used to produce this model:
models:
- model: CultriX/Qwen2.5-14B-Brocav7
parameters:
weight: 0.18 # Backbone for logical reasoning and multitask performance.
density: 0.55 # Balances precision and versatility for critical tasks.
- model: djuna/Q2.5-Veltha-14B-0.5
parameters:
weight: 0.15 # Advanced reasoning contributor with balanced impact.
density: 0.45 # Retains essential parameters for contextual tasks.
- model: allknowingroger/QwenSlerp6-14B
parameters:
weight: 0.09 # Specialized contributions to MMLU-PRO and multitask performance.
density: 0.35 # Focused on key parameters for contextual reasoning.
- model: sometimesanotion/Qwenvergence-14B-v3-Prose
parameters:
weight: 0.09 # Supports MATH, GPQA, and MUSR benchmarks without redundancy.
density: 0.40 # Balanced retention for logical reasoning.
- model: CultriX/Qwen2.5-14B-Broca
parameters:
weight: 0.07 # Logical reasoning and tiny benchmarks contributor.
density: 0.40 # Ensures critical reasoning parameters are preserved.
- model: CultriX/Qwenfinity-2.5-14B
parameters:
weight: 0.05 # Generalist multitask performer with broad contributions.
density: 0.40 # Balances multitask performance and precision.
- model: CultriX/Qwen2.5-14B-Unity
parameters:
weight: 0.04 # Enhances MUSR and BBH tasks with unique capabilities.
density: 0.40 # Retains enough parameters for balanced task support.
- model: CultriX/Qwen2.5-14B-Wernickev3
parameters:
weight: 0.03 # Focused on language understanding and MUSR tasks.
density: 0.35 # Preserves high-quality parameters without overlap.
- model: CultriX/Qwen2.5-14B-partialmergept1
parameters:
weight: 0.13 # Balanced contributions to multitask benchmarks like MMLU-PRO.
density: 0.45 # Retains essential parameters without over-representation.
- model: CultriX/Qwen2.5-14B-Brocav9
parameters:
weight: 0.19 # Strong logical reasoning and multitask contributor.
density: 0.50 # Retains more parameters to maximize impact.
base_model: CultriX/Qwen2.5-14B-Brocav7
# Chosen for its logical reasoning and task versatility.
merge_method: dare_ties
# Ensures smooth integration of diverse model strengths.
parameters:
normalize: true # Ensures consistency in parameter scaling.
int8_mask: true # Optimizes memory and computation.
dtype: bfloat16
# Provides high precision with efficient memory usage, ideal for large-scale models.
tokenizer_source: CultriX/Qwen2.5-14B-Brocav7
# Matches the tokenizer to the base model for compatibility.
adaptive_merge_parameters:
task_weights:
tinyArc: 1.85 # Logical reasoning priority from Brocav7 and Brocav9.
tinyHellaswag: 1.7 # Balanced contextual understanding.
tinyMMLU: 1.9 # Enhanced domain knowledge from multitask models.
tinyTruthfulQA: 2.2 # Prioritized factual reasoning with Veltha's strength.
tinyTruthfulQA_mc1: 2.0 # Balanced focus on multiple-choice reasoning.
tinyWinogrande: 2.0 # Advanced contextual predictions.
IFEval: 2.5 # Instruction-following maximized with Brocav9.
BBH: 2.2 # Strengthened for complex reasoning tasks.
MATH: 2.4 # High focus on mathematical problem-solving.
GPQA: 2.15 # Enhanced QA capabilities leveraging Brocav7 and Brocav9.
MUSR: 2.2 # Balanced for multi-step reasoning improvements.
MMLU-PRO: 2.35 # High domain multitask performance weight.
smoothing_factor: 0.03
# Further reduced for sharper task-specific blending, preserving distinct strengths.
gradient_clipping:
CultriX/Qwen2.5-14B-Brocav7: 0.77 # Ensures stable contributions while leveraging strong logical reasoning.
djuna/Q2.5-Veltha-14B-0.5: 0.83 # Optimized for advanced reasoning and MUSR tasks.
allknowingroger/QwenSlerp6-14B: 0.80 # Supports contributions to MMLU-PRO while maintaining stability.
sometimesanotion/Qwenvergence-14B-v3-Prose: 0.79 # Calibrated for precision in MATH, GPQA, and MUSR.
CultriX/Qwen2.5-14B-Broca: 0.81 # Fine-tuned for logical reasoning enhancements.
CultriX/Qwenfinity-2.5-14B: 0.79 # Balanced for multitask contributions.
CultriX/Qwen2.5-14B-Unity: 0.81 # Calibrated to support unique task contributions.
CultriX/Qwen2.5-14B-Wernickev3: 0.83 # Optimized for high-quality language understanding and GPQA.
CultriX/Qwen2.5-14B-partialmergept1: 0.82 # Supports balanced multitask performance.
CultriX/Qwen2.5-14B-Brocav9: 0.83 # Further optimized for logical reasoning and multitask improvements.