llmixer commited on
Commit
bfe59bb
1 Parent(s): be82155

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +89 -0
README.md ADDED
@@ -0,0 +1,89 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model:
3
+ - meta-llama/Meta-Llama-3-70B-Instruct
4
+ license: llama3
5
+ language:
6
+ - en
7
+ pipeline_tag: text-generation
8
+ tags:
9
+ - merge
10
+ - frankenmerge
11
+ - 96b
12
+ ---
13
+ # BigWeave v33 105b
14
+
15
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/65a6db055c58475cf9e6def1/4CbbAN-X7ZWj702JrcCGH.png" width=600>
16
+
17
+ The BigWeave models aim to experimentally identify merge settings for increasing model performance. The version number merely tracks various attempts and is not a quality indicator. Only results demonstrating good performance are retained and shared.
18
+
19
+ # Prompting Format
20
+ llamav3
21
+
22
+ # Merge process
23
+ This is a self-merge of meta-llama/Meta-Llama-3-70B-Instruct. Middle layers are duplicated and various matrices are scaled according to the template by jukofyork as shown here: https://github.com/arcee-ai/mergekit/issues/198#issuecomment-2079950009
24
+
25
+ Merge configuration:
26
+ ```
27
+ const_tag: &MODEL meta-llama/Meta-Llama-3-70B-Instruct
28
+
29
+ const_tag: &RESIDUAL_SCALE_FACTOR 0.5
30
+ const_tag: &QK_ATTENUATION_FACTOR 0.7071067812
31
+ const_tag: &OUT_FACTOR 0.9
32
+
33
+ scale-filter-env: &scale_filter_env
34
+ parameters:
35
+ scale:
36
+ - filter: o_proj
37
+ value: *RESIDUAL_SCALE_FACTOR
38
+ - filter: down_proj
39
+ value: *RESIDUAL_SCALE_FACTOR
40
+ - filter: q_proj
41
+ value: *QK_ATTENUATION_FACTOR
42
+ - filter: k_proj
43
+ value: *QK_ATTENUATION_FACTOR
44
+ - filter: v_proj
45
+ value: *OUT_FACTOR
46
+ - filter: up_proj
47
+ value: *OUT_FACTOR
48
+ - value: 1.0
49
+
50
+ slices:
51
+ - sources:
52
+ - model: *MODEL
53
+ layer_range: [0, 19]
54
+ - sources:
55
+ - model: *MODEL
56
+ layer_range: [19, 20]
57
+ <<: *scale_filter_env
58
+
59
+ - sources:
60
+ - model: *MODEL
61
+ layer_range: [10, 29]
62
+ - sources:
63
+ - model: *MODEL
64
+ layer_range: [29, 30]
65
+ <<: *scale_filter_env
66
+
67
+ - sources:
68
+ - model: *MODEL
69
+ layer_range: [20, 39]
70
+ - sources:
71
+ - model: *MODEL
72
+ layer_range: [39, 40]
73
+ <<: *scale_filter_env
74
+
75
+ - sources:
76
+ - model: *MODEL
77
+ layer_range: [30, 49]
78
+ - sources:
79
+ - model: *MODEL
80
+ layer_range: [49, 50]
81
+ <<: *scale_filter_env
82
+
83
+ - sources:
84
+ - model: *MODEL
85
+ layer_range: [40, 80]
86
+
87
+ merge_method: passthrough
88
+ dtype: float16
89
+ ```