Qwen2.5-14B-MetaMergev2

This is a merge of pre-trained language models created using mergekit.
Merge Details

Merge Method

This model was merged using the DARE TIES merge method using CultriX/Qwen2.5-14B-Brocav7 as a base.
Models Merged

The following models were included in the merge:
Configuration

The following YAML configuration was used to produce this model:
models:
  - model: CultriX/Qwen2.5-14B-Brocav7
    parameters:
      weight: 0.18  # Backbone for logical reasoning and multitask performance.
      density: 0.55  # Balances precision and versatility for critical tasks.
  - model: djuna/Q2.5-Veltha-14B-0.5
    parameters:
      weight: 0.15  # Advanced reasoning contributor with balanced impact.
      density: 0.45  # Retains essential parameters for contextual tasks.
  - model: allknowingroger/QwenSlerp6-14B
    parameters:
      weight: 0.09  # Specialized contributions to MMLU-PRO and multitask performance.
      density: 0.35  # Focused on key parameters for contextual reasoning.
  - model: sometimesanotion/Qwenvergence-14B-v3-Prose
    parameters:
      weight: 0.09  # Supports MATH, GPQA, and MUSR benchmarks without redundancy.
      density: 0.40  # Balanced retention for logical reasoning.
  - model: CultriX/Qwen2.5-14B-Broca
    parameters:
      weight: 0.07  # Logical reasoning and tiny benchmarks contributor.
      density: 0.40  # Ensures critical reasoning parameters are preserved.
  - model: CultriX/Qwenfinity-2.5-14B
    parameters:
      weight: 0.05  # Generalist multitask performer with broad contributions.
      density: 0.40  # Balances multitask performance and precision.
  - model: CultriX/Qwen2.5-14B-Unity
    parameters:
      weight: 0.04  # Enhances MUSR and BBH tasks with unique capabilities.
      density: 0.40  # Retains enough parameters for balanced task support.
  - model: CultriX/Qwen2.5-14B-Wernickev3
    parameters:
      weight: 0.03  # Focused on language understanding and MUSR tasks.
      density: 0.35  # Preserves high-quality parameters without overlap.
  - model: CultriX/Qwen2.5-14B-partialmergept1
    parameters:
      weight: 0.13  # Balanced contributions to multitask benchmarks like MMLU-PRO.
      density: 0.45  # Retains essential parameters without over-representation.
  - model: CultriX/Qwen2.5-14B-Brocav9
    parameters:
      weight: 0.19  # Strong logical reasoning and multitask contributor.
      density: 0.50  # Retains more parameters to maximize impact.

base_model: CultriX/Qwen2.5-14B-Brocav7
# Chosen for its logical reasoning and task versatility.

merge_method: dare_ties
# Ensures smooth integration of diverse model strengths.

parameters:
  normalize: true  # Ensures consistency in parameter scaling.
  int8_mask: true  # Optimizes memory and computation.

dtype: bfloat16
# Provides high precision with efficient memory usage, ideal for large-scale models.

tokenizer_source: CultriX/Qwen2.5-14B-Brocav7
# Matches the tokenizer to the base model for compatibility.

adaptive_merge_parameters:
  task_weights:
    tinyArc: 1.85        # Logical reasoning priority from Brocav7 and Brocav9.
    tinyHellaswag: 1.7   # Balanced contextual understanding.
    tinyMMLU: 1.9        # Enhanced domain knowledge from multitask models.
    tinyTruthfulQA: 2.2  # Prioritized factual reasoning with Veltha's strength.
    tinyTruthfulQA_mc1: 2.0  # Balanced focus on multiple-choice reasoning.
    tinyWinogrande: 2.0  # Advanced contextual predictions.
    IFEval: 2.5          # Instruction-following maximized with Brocav9.
    BBH: 2.2             # Strengthened for complex reasoning tasks.
    MATH: 2.4            # High focus on mathematical problem-solving.
    GPQA: 2.15           # Enhanced QA capabilities leveraging Brocav7 and Brocav9.
    MUSR: 2.2            # Balanced for multi-step reasoning improvements.
    MMLU-PRO: 2.35       # High domain multitask performance weight.
  smoothing_factor: 0.03
  # Further reduced for sharper task-specific blending, preserving distinct strengths.

gradient_clipping:
  CultriX/Qwen2.5-14B-Brocav7: 0.77  # Ensures stable contributions while leveraging strong logical reasoning.
  djuna/Q2.5-Veltha-14B-0.5: 0.83   # Optimized for advanced reasoning and MUSR tasks.
  allknowingroger/QwenSlerp6-14B: 0.80  # Supports contributions to MMLU-PRO while maintaining stability.
  sometimesanotion/Qwenvergence-14B-v3-Prose: 0.79  # Calibrated for precision in MATH, GPQA, and MUSR.
  CultriX/Qwen2.5-14B-Broca: 0.81   # Fine-tuned for logical reasoning enhancements.
  CultriX/Qwenfinity-2.5-14B: 0.79  # Balanced for multitask contributions.
  CultriX/Qwen2.5-14B-Unity: 0.81   # Calibrated to support unique task contributions.
  CultriX/Qwen2.5-14B-Wernickev3: 0.83  # Optimized for high-quality language understanding and GPQA.
  CultriX/Qwen2.5-14B-partialmergept1: 0.82  # Supports balanced multitask performance.
  CultriX/Qwen2.5-14B-Brocav9: 0.83  # Further optimized for logical reasoning and multitask improvements.
CultriX
/

model

Qwen2.5-14B-MetaMergev2

Merge Details

Merge Method

Models Merged

Configuration

Model tree for CultriX/model

Space using CultriX/model 1