umiyuki commited on
Commit
fcee75f
1 Parent(s): 671a3f9

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +177 -0
README.md ADDED
@@ -0,0 +1,177 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model:
3
+ - shisa-ai/shisa-v1-llama3-8b
4
+ - aixsatoshi/Llama-3-youko-8b-instruct-chatvector
5
+ - meta-llama/Meta-Llama-3-8B-Instruct
6
+ - lightblue/suzume-llama-3-8B-multilingual
7
+ library_name: transformers
8
+ tags:
9
+ - mergekit
10
+ - merge
11
+
12
+ ---
13
+ # final_model
14
+
15
+ This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
16
+
17
+ ## Merge Details
18
+ ### Merge Method
19
+
20
+ This model was merged using the [linear](https://arxiv.org/abs/2203.05482) merge method using [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) as a base.
21
+
22
+ ### Models Merged
23
+
24
+ The following models were included in the merge:
25
+ * [shisa-ai/shisa-v1-llama3-8b](https://huggingface.co/shisa-ai/shisa-v1-llama3-8b)
26
+ * [aixsatoshi/Llama-3-youko-8b-instruct-chatvector](https://huggingface.co/aixsatoshi/Llama-3-youko-8b-instruct-chatvector)
27
+ * [lightblue/suzume-llama-3-8B-multilingual](https://huggingface.co/lightblue/suzume-llama-3-8B-multilingual)
28
+
29
+ ### Configuration
30
+
31
+ The following YAML configuration was used to produce this model:
32
+
33
+ ```yaml
34
+ base_model: meta-llama/Meta-Llama-3-8B-Instruct
35
+ dtype: bfloat16
36
+ merge_method: linear
37
+ parameters:
38
+ int8_mask: 1.0
39
+ normalize: 1.0
40
+ slices:
41
+ - sources:
42
+ - layer_range: [0, 4]
43
+ model: lightblue/suzume-llama-3-8B-multilingual
44
+ parameters:
45
+ weight: 0.4149739730274144
46
+ - layer_range: [0, 4]
47
+ model: meta-llama/Meta-Llama-3-8B-Instruct
48
+ parameters:
49
+ weight: 0.6781276007090549
50
+ - layer_range: [0, 4]
51
+ model: aixsatoshi/Llama-3-youko-8b-instruct-chatvector
52
+ parameters:
53
+ weight: 0.34616999273932425
54
+ - layer_range: [0, 4]
55
+ model: shisa-ai/shisa-v1-llama3-8b
56
+ parameters:
57
+ weight: 1.3720042419649354
58
+ - sources:
59
+ - layer_range: [4, 8]
60
+ model: lightblue/suzume-llama-3-8B-multilingual
61
+ parameters:
62
+ weight: 0.07652836818139683
63
+ - layer_range: [4, 8]
64
+ model: meta-llama/Meta-Llama-3-8B-Instruct
65
+ parameters:
66
+ weight: 1.234379009181979
67
+ - layer_range: [4, 8]
68
+ model: aixsatoshi/Llama-3-youko-8b-instruct-chatvector
69
+ parameters:
70
+ weight: 1.0146729889059811
71
+ - layer_range: [4, 8]
72
+ model: shisa-ai/shisa-v1-llama3-8b
73
+ parameters:
74
+ weight: 0.5811532109389872
75
+ - sources:
76
+ - layer_range: [8, 12]
77
+ model: lightblue/suzume-llama-3-8B-multilingual
78
+ parameters:
79
+ weight: 0.5551700273906248
80
+ - layer_range: [8, 12]
81
+ model: meta-llama/Meta-Llama-3-8B-Instruct
82
+ parameters:
83
+ weight: 0.7418501521559635
84
+ - layer_range: [8, 12]
85
+ model: aixsatoshi/Llama-3-youko-8b-instruct-chatvector
86
+ parameters:
87
+ weight: 1.442504375594772
88
+ - layer_range: [8, 12]
89
+ model: shisa-ai/shisa-v1-llama3-8b
90
+ parameters:
91
+ weight: 0.6475631873316974
92
+ - sources:
93
+ - layer_range: [12, 16]
94
+ model: lightblue/suzume-llama-3-8B-multilingual
95
+ parameters:
96
+ weight: 0.4227647782669271
97
+ - layer_range: [12, 16]
98
+ model: meta-llama/Meta-Llama-3-8B-Instruct
99
+ parameters:
100
+ weight: 1.2969869792284983
101
+ - layer_range: [12, 16]
102
+ model: aixsatoshi/Llama-3-youko-8b-instruct-chatvector
103
+ parameters:
104
+ weight: 0.7818773805802817
105
+ - layer_range: [12, 16]
106
+ model: shisa-ai/shisa-v1-llama3-8b
107
+ parameters:
108
+ weight: 0.8007371182560976
109
+ - sources:
110
+ - layer_range: [16, 20]
111
+ model: lightblue/suzume-llama-3-8B-multilingual
112
+ parameters:
113
+ weight: 0.10979010874744283
114
+ - layer_range: [16, 20]
115
+ model: meta-llama/Meta-Llama-3-8B-Instruct
116
+ parameters:
117
+ weight: 0.19009547180175693
118
+ - layer_range: [16, 20]
119
+ model: aixsatoshi/Llama-3-youko-8b-instruct-chatvector
120
+ parameters:
121
+ weight: 0.6064294349661996
122
+ - layer_range: [16, 20]
123
+ model: shisa-ai/shisa-v1-llama3-8b
124
+ parameters:
125
+ weight: 0.7630087852386511
126
+ - sources:
127
+ - layer_range: [20, 24]
128
+ model: lightblue/suzume-llama-3-8B-multilingual
129
+ parameters:
130
+ weight: 0.219671192433268
131
+ - layer_range: [20, 24]
132
+ model: meta-llama/Meta-Llama-3-8B-Instruct
133
+ parameters:
134
+ weight: 0.6303503074132494
135
+ - layer_range: [20, 24]
136
+ model: aixsatoshi/Llama-3-youko-8b-instruct-chatvector
137
+ parameters:
138
+ weight: 0.46265431269055757
139
+ - layer_range: [20, 24]
140
+ model: shisa-ai/shisa-v1-llama3-8b
141
+ parameters:
142
+ weight: 1.4662350856064592
143
+ - sources:
144
+ - layer_range: [24, 28]
145
+ model: lightblue/suzume-llama-3-8B-multilingual
146
+ parameters:
147
+ weight: 0.1400550380200451
148
+ - layer_range: [24, 28]
149
+ model: meta-llama/Meta-Llama-3-8B-Instruct
150
+ parameters:
151
+ weight: 1.031570135674053
152
+ - layer_range: [24, 28]
153
+ model: aixsatoshi/Llama-3-youko-8b-instruct-chatvector
154
+ parameters:
155
+ weight: 0.5760956440228217
156
+ - layer_range: [24, 28]
157
+ model: shisa-ai/shisa-v1-llama3-8b
158
+ parameters:
159
+ weight: 1.5264012437679564
160
+ - sources:
161
+ - layer_range: [28, 32]
162
+ model: lightblue/suzume-llama-3-8B-multilingual
163
+ parameters:
164
+ weight: 1.2311282964552015
165
+ - layer_range: [28, 32]
166
+ model: meta-llama/Meta-Llama-3-8B-Instruct
167
+ parameters:
168
+ weight: 0.43811773040605967
169
+ - layer_range: [28, 32]
170
+ model: aixsatoshi/Llama-3-youko-8b-instruct-chatvector
171
+ parameters:
172
+ weight: 0.5150682019605872
173
+ - layer_range: [28, 32]
174
+ model: shisa-ai/shisa-v1-llama3-8b
175
+ parameters:
176
+ weight: 0.342193342214983
177
+ ```