File size: 5,047 Bytes
fbcc099 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 |
---
base_model:
- sometimesanotion/Lamarck-14B-v0.7-rc4
- sthenno/tempesthenno-ppo-ckpt40
library_name: transformers
tags:
- mergekit
- merge
---
# merge
This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
## Merge Details
### Merge Method
This model was merged using the [SLERP](https://en.wikipedia.org/wiki/Slerp) merge method.
### Models Merged
The following models were included in the merge:
* [sometimesanotion/Lamarck-14B-v0.7-rc4](https://huggingface.co/sometimesanotion/Lamarck-14B-v0.7-rc4)
* [sthenno/tempesthenno-ppo-ckpt40](https://huggingface.co/sthenno/tempesthenno-ppo-ckpt40)
### Configuration
The following YAML configuration was used to produce this model:
```yaml
# =============================================================================
# SuperMerge-14B-Simple
#
# This configuration merges only two components:
# - Base Model: Provides stable foundational features.
# Model: sometimesanotion/Lamarck-14B-v0.7-rc4
#
# - Reasoning Module: Drives enhanced mid-layer reasoning.
# Model: sthenno/tempesthenno-ppo-ckpt40
#
# The merge is performed using slerp with a V-shaped interpolation curve.
# Weighting across each 8-layer slice is tuned to balance core feature
# preservation with advanced reasoning.
# =============================================================================
name: SuperMerge-14B-Simple
merge_method: slerp
base_model: sometimesanotion/Lamarck-14B-v0.7-rc4
tokenizer_source: base
dtype: float32
out_dtype: bfloat16
parameters:
int8_mask: true # Optimize memory usage.
normalize: true # Ensure weights are on a comparable scale.
rescale: false # No additional rescaling necessary.
# Interpolation curve for 6 slices (48 layers total):
# Maintains a V-shaped emphasis for mid-layer processing.
t: [0.1, 0.35, 0.85, 0.85, 0.35, 0.1]
slices:
# ---------------------------------------------------------------------------
# Slice 1 (Layers 0-8):
# - Early layers: nearly pure base model with minimal PPO influence.
# ---------------------------------------------------------------------------
- sources:
- model: sometimesanotion/Lamarck-14B-v0.7-rc4
layer_range: [0, 8]
parameters:
weight: 0.95
- model: sthenno/tempesthenno-ppo-ckpt40
layer_range: [0, 8]
parameters:
weight: 0.05
# ---------------------------------------------------------------------------
# Slice 2 (Layers 8-16):
# - Blend base with stronger PPO contributions to boost reasoning.
# ---------------------------------------------------------------------------
- sources:
- model: sometimesanotion/Lamarck-14B-v0.7-rc4
layer_range: [8, 16]
parameters:
weight: 0.4
- model: sthenno/tempesthenno-ppo-ckpt40
layer_range: [8, 16]
parameters:
weight: 0.6
# ---------------------------------------------------------------------------
# Slice 3 (Layers 16-24):
# - Mid-layer: Prioritize advanced reasoning by increasing the PPO share.
# ---------------------------------------------------------------------------
- sources:
- model: sometimesanotion/Lamarck-14B-v0.7-rc4
layer_range: [16, 24]
parameters:
weight: 0.3
- model: sthenno/tempesthenno-ppo-ckpt40
layer_range: [16, 24]
parameters:
weight: 0.7
# ---------------------------------------------------------------------------
# Slice 4 (Layers 24-32):
# - Continue the focus on reasoning with PPO while still retaining base traits.
# ---------------------------------------------------------------------------
- sources:
- model: sometimesanotion/Lamarck-14B-v0.7-rc4
layer_range: [24, 32]
parameters:
weight: 0.35
- model: sthenno/tempesthenno-ppo-ckpt40
layer_range: [24, 32]
parameters:
weight: 0.65
# ---------------------------------------------------------------------------
# Slice 5 (Layers 32-40):
# - Re-stabilize the network with a stronger base model contribution.
# ---------------------------------------------------------------------------
- sources:
- model: sometimesanotion/Lamarck-14B-v0.7-rc4
layer_range: [32, 40]
parameters:
weight: 0.6
- model: sthenno/tempesthenno-ppo-ckpt40
layer_range: [32, 40]
parameters:
weight: 0.4
# ---------------------------------------------------------------------------
# Slice 6 (Layers 40-48):
# - Final output layers: Maintain fluency with the base model augmented by PPO.
# ---------------------------------------------------------------------------
- sources:
- model: sometimesanotion/Lamarck-14B-v0.7-rc4
layer_range: [40, 48]
parameters:
weight: 0.6
- model: sthenno/tempesthenno-ppo-ckpt40
layer_range: [40, 48]
parameters:
weight: 0.4
```
|