MohammadOthman commited on
Commit
a384e03
1 Parent(s): a8d4fae

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +28 -0
README.md ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Mistral-Merge-7B-slerp
2
+
3
+ ## Model Description
4
+ The `Mistral-Merge-7B-slerp` is a merged model which leverages the spherical linear interpolation (SLERP) technique to blend layers from two distinct transformer-based models. This merging strategy is aimed at synthesizing a model that incorporates the robust linguistic capabilities of `OpenPipe/mistral-ft-optimized-1218` and the nuanced understanding of `mlabonne/NeuralHermes-2.5-Mistral-7B`.
5
+
6
+ ## Configuration
7
+ The merging process was configured to apply a SLERP method across all comparable layers of the two source models. Below is the YAML configuration used for merging:
8
+
9
+ ```yaml
10
+ slices:
11
+ - sources:
12
+ - model: OpenPipe/mistral-ft-optimized-1218
13
+ layer_range: [0, 32]
14
+ - model: mlabonne/NeuralHermes-2.5-Mistral-7B
15
+ layer_range: [0, 32]
16
+ merge_method: slerp
17
+ base_model: OpenPipe/mistral-ft-optimized-1218
18
+ parameters:
19
+ t:
20
+ - filter: self_attn
21
+ value: [0, 0.5, 0.3, 0.7, 1]
22
+ - filter: mlp
23
+ value: [1, 0.5, 0.7, 0.3, 0]
24
+ - value: 0.5
25
+ dtype: bfloat16
26
+ ```
27
+
28
+ This configuration ensures that both self-attention and MLP (multi-layer perceptron) layers undergo interpolation with a gradient of weights to optimize the integration of features from both models.