Qwen2.5-14B-MetaMergev2

This is a merge of pre-trained language models created using mergekit.

Merge Details

Merge Method

This model was merged using the DARE TIES merge method using CultriX/Qwen2.5-14B-Brocav7 as a base.

Models Merged

The following models were included in the merge:

Configuration

The following YAML configuration was used to produce this model:

models:
  - model: CultriX/Qwen2.5-14B-Brocav7
    parameters:
      weight: 0.18  # Backbone for logical reasoning and multitask performance.
      density: 0.55  # Balances precision and versatility for critical tasks.
  - model: djuna/Q2.5-Veltha-14B-0.5
    parameters:
      weight: 0.15  # Advanced reasoning contributor with balanced impact.
      density: 0.45  # Retains essential parameters for contextual tasks.
  - model: allknowingroger/QwenSlerp6-14B
    parameters:
      weight: 0.09  # Specialized contributions to MMLU-PRO and multitask performance.
      density: 0.35  # Focused on key parameters for contextual reasoning.
  - model: sometimesanotion/Qwenvergence-14B-v3-Prose
    parameters:
      weight: 0.09  # Supports MATH, GPQA, and MUSR benchmarks without redundancy.
      density: 0.40  # Balanced retention for logical reasoning.
  - model: CultriX/Qwen2.5-14B-Broca
    parameters:
      weight: 0.07  # Logical reasoning and tiny benchmarks contributor.
      density: 0.40  # Ensures critical reasoning parameters are preserved.
  - model: CultriX/Qwenfinity-2.5-14B
    parameters:
      weight: 0.05  # Generalist multitask performer with broad contributions.
      density: 0.40  # Balances multitask performance and precision.
  - model: CultriX/Qwen2.5-14B-Unity
    parameters:
      weight: 0.04  # Enhances MUSR and BBH tasks with unique capabilities.
      density: 0.40  # Retains enough parameters for balanced task support.
  - model: CultriX/Qwen2.5-14B-Wernickev3
    parameters:
      weight: 0.03  # Focused on language understanding and MUSR tasks.
      density: 0.35  # Preserves high-quality parameters without overlap.
  - model: CultriX/Qwen2.5-14B-partialmergept1
    parameters:
      weight: 0.13  # Balanced contributions to multitask benchmarks like MMLU-PRO.
      density: 0.45  # Retains essential parameters without over-representation.
  - model: CultriX/Qwen2.5-14B-Brocav9
    parameters:
      weight: 0.19  # Strong logical reasoning and multitask contributor.
      density: 0.50  # Retains more parameters to maximize impact.

base_model: CultriX/Qwen2.5-14B-Brocav7
# Chosen for its logical reasoning and task versatility.

merge_method: dare_ties
# Ensures smooth integration of diverse model strengths.

parameters:
  normalize: true  # Ensures consistency in parameter scaling.
  int8_mask: true  # Optimizes memory and computation.

dtype: bfloat16
# Provides high precision with efficient memory usage, ideal for large-scale models.

tokenizer_source: CultriX/Qwen2.5-14B-Brocav7
# Matches the tokenizer to the base model for compatibility.

adaptive_merge_parameters:
  task_weights:
    tinyArc: 1.85        # Logical reasoning priority from Brocav7 and Brocav9.
    tinyHellaswag: 1.7   # Balanced contextual understanding.
    tinyMMLU: 1.9        # Enhanced domain knowledge from multitask models.
    tinyTruthfulQA: 2.2  # Prioritized factual reasoning with Veltha's strength.
    tinyTruthfulQA_mc1: 2.0  # Balanced focus on multiple-choice reasoning.
    tinyWinogrande: 2.0  # Advanced contextual predictions.
    IFEval: 2.5          # Instruction-following maximized with Brocav9.
    BBH: 2.2             # Strengthened for complex reasoning tasks.
    MATH: 2.4            # High focus on mathematical problem-solving.
    GPQA: 2.15           # Enhanced QA capabilities leveraging Brocav7 and Brocav9.
    MUSR: 2.2            # Balanced for multi-step reasoning improvements.
    MMLU-PRO: 2.35       # High domain multitask performance weight.
  smoothing_factor: 0.03
  # Further reduced for sharper task-specific blending, preserving distinct strengths.

gradient_clipping:
  CultriX/Qwen2.5-14B-Brocav7: 0.77  # Ensures stable contributions while leveraging strong logical reasoning.
  djuna/Q2.5-Veltha-14B-0.5: 0.83   # Optimized for advanced reasoning and MUSR tasks.
  allknowingroger/QwenSlerp6-14B: 0.80  # Supports contributions to MMLU-PRO while maintaining stability.
  sometimesanotion/Qwenvergence-14B-v3-Prose: 0.79  # Calibrated for precision in MATH, GPQA, and MUSR.
  CultriX/Qwen2.5-14B-Broca: 0.81   # Fine-tuned for logical reasoning enhancements.
  CultriX/Qwenfinity-2.5-14B: 0.79  # Balanced for multitask contributions.
  CultriX/Qwen2.5-14B-Unity: 0.81   # Calibrated to support unique task contributions.
  CultriX/Qwen2.5-14B-Wernickev3: 0.83  # Optimized for high-quality language understanding and GPQA.
  CultriX/Qwen2.5-14B-partialmergept1: 0.82  # Supports balanced multitask performance.
  CultriX/Qwen2.5-14B-Brocav9: 0.83  # Further optimized for logical reasoning and multitask improvements.
Downloads last month
11
Safetensors
Model size
14.8B params
Tensor type
BF16
ยท
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for CultriX/model

Space using CultriX/model 1