ChuckMcSneed commited on
Commit
95e5683
·
verified ·
1 Parent(s): c361036

Upload 8 files

Browse files
README.md CHANGED
@@ -1,3 +1,43 @@
1
- ---
2
- license: llama2
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # BETTER THAN GOLIATH?!
2
+ I've merged [Xwin-lora that I made](https://huggingface.co/ChuckMcSneed/Xwin-LM-70B-V0.1-LORA) with [Euryale](https://huggingface.co/Sao10K/Euryale-1.3-L2-70B) and then merged it with itself in [goliath-style merge](/config.yml) using [mergekit](https://github.com/arcee-ai/mergekit). The resulting model performs better than [goliath](https://huggingface.co/alpindale/goliath-120b) on my tests(note: performance on tests is not necessarily performance in practice). Test it, have fun with it. I'll upload Xwin-LORAEuryale selfmerge next.
3
+ # Ideas behind it
4
+ Since the creation of Goliath I was wondering if it was possible to make something even better. I've tried linear, passthrough, SLERP, TIES-merging models, but I could not recreate the greatness of goliath, at least not in a way that I liked in practical use. I knew about the existence of LORAs but I didn't know how well they performed. I created a model named [Gembo](https://huggingface.co/ChuckMcSneed/Gembo-v1-70b) by merging a shitton of LORAs together, and surprisingly it worked! In fact it worked so well that it was the best model on my benchmarks until now. When I found a tool named [LORD](https://github.com/thomasgauthier/LoRD), which can extract LORA from any model, I knew I could do something even better.
5
+
6
+ I've extracted LORA from Euryale, then from Xwin and began testing. Merging Euryale-lora to Xwin and the other way around, created better models, which outperformed their parents:
7
+
8
+ |Name |Quant|Size|B |C |D |S |P |total|BCD|SP |
9
+ |-------------------------------------|-----|----|---|---|---|---|----|-----|---|-----|
10
+ |Sao10K/Euryale-1.3-L2-70B |Q6_K |70B |0 |2 |0 |3 |5 |10 |2 |8 |
11
+ |Sao10K/Euryale-1.3-L2-70B+xwin-lora |Q6_K |70B |2 |2 |1 |5.5|5.5 |16 |5 |11 |
12
+ |Xwin-LM/Xwin-LM-70B-V0.1 |Q6_K |70B |0 |1 |2 |5.5|5.25|13.75|3 |10.75|
13
+ |Xwin-LM/Xwin-LM-70B-V0.1+euryale-lora|Q6_K |70B |3 |2 |2 |6 |5 |18 |7 |11 |
14
+
15
+ Results seemed promising, so I continued testing, merging it in goliath-like way in different orders(EX=Euryale+LORAXwin; XE=Xwin+LORAEuryale). The results were even more surprising:
16
+
17
+ |Name |Quant|Size|B |C |D |S |P |total|BCD|SP |
18
+ |-------------------------------------|-----|----|---|---|---|---|----|-----|---|-----|
19
+ |alpindale/goliath-120b |Q6_K |120B|3 |2 |1 |6 |6 |18 |6 |12 |
20
+ |ChuckMcSneed/Premerge-EX-EX-123B(this model) |Q6_K |123B|2 |2 |1.5|7.25|6 |18.75|5.5|13.25|
21
+ |ChuckMcSneed/Premerge-EX-XE-123B |Q6_K |123B|2 |2 |2 |5.75|6 |17.75|6 |11.75|
22
+ |ChuckMcSneed/Premerge-XE-EX-123B |Q6_K |123B|2 |2 |2.5|6.75|5.5 |18.75|6.5|12.25|
23
+ |ChuckMcSneed/Premerge-XE-XE-123B |Q6_K |123B|3 |2 |2.5|7.25|5.25|20 |7.5|12.5 |
24
+ |Sao10K/Euryale-1.3-L2-70B+xwin-lora |Q6_K |70B |2 |2 |1 |5.5|5.5 |16 |5 |11 |
25
+ |Xwin-LM/Xwin-LM-70B-V0.1+euryale-lora|Q6_K |70B |3 |2 |2 |6 |5 |18 |7 |11 |
26
+
27
+ Contrary to my expectations, merging two different models was suboptimal in this case. Selfmerge of Euryale-LORAXwin(this model) did beat all of the other merges on SP tests(creative writing), making it the highest scoring model on those tests that I've tested so far, and selfmerge of Xwin-LORAEuryale had highest score overall.
28
+ # What it means
29
+ Potentially in the future we can get better models by controlled merging of LORAs.
30
+ # Benchmarks
31
+ ### NeoEvalPlusN
32
+ [My meme benchmark.](https://huggingface.co/datasets/ChuckMcSneed/NeoEvalPlusN_benchmark)
33
+ |Name |Quant|Size|B |C |D |S |P |total|BCD|SP |
34
+ |-------------------------------------|-----|----|---|---|---|---|----|-----|---|-----|
35
+ |alpindale/goliath-120b |Q6_K |120B|3 |2 |1 |6 |6 |18 |6 |12 |
36
+ |ChuckMcSneed/Premerge-EX-EX-123B(this model) |Q6_K |123B|2 |2 |1.5|7.25|6 |18.75|5.5|13.25|
37
+ |ChuckMcSneed/Premerge-EX-XE-123B |Q6_K |123B|2 |2 |2 |5.75|6 |17.75|6 |11.75|
38
+ |ChuckMcSneed/Premerge-XE-EX-123B |Q6_K |123B|2 |2 |2.5|6.75|5.5 |18.75|6.5|12.25|
39
+ |ChuckMcSneed/Premerge-XE-XE-123B |Q6_K |123B|3 |2 |2.5|7.25|5.25|20 |7.5|12.5 |
40
+ |Sao10K/Euryale-1.3-L2-70B |Q6_K |70B |0 |2 |0 |3 |5 |10 |2 |8 |
41
+ |Sao10K/Euryale-1.3-L2-70B+xwin-lora |Q6_K |70B |2 |2 |1 |5.5|5.5 |16 |5 |11 |
42
+ |Xwin-LM/Xwin-LM-70B-V0.1 |Q6_K |70B |0 |1 |2 |5.5|5.25|13.75|3 |10.75|
43
+ |Xwin-LM/Xwin-LM-70B-V0.1+euryale-lora|Q6_K |70B |3 |2 |2 |6 |5 |18 |7 |11 |
config.json ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "/euryale-LORAxwin-70B",
3
+ "architectures": [
4
+ "LlamaForCausalLM"
5
+ ],
6
+ "attention_bias": false,
7
+ "attention_dropout": 0.0,
8
+ "bos_token_id": 1,
9
+ "eos_token_id": 2,
10
+ "hidden_act": "silu",
11
+ "hidden_size": 8192,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 28672,
14
+ "max_position_embeddings": 4096,
15
+ "model_type": "llama",
16
+ "num_attention_heads": 64,
17
+ "num_hidden_layers": 144,
18
+ "num_key_value_heads": 8,
19
+ "pretraining_tp": 1,
20
+ "rms_norm_eps": 1e-05,
21
+ "rope_scaling": null,
22
+ "rope_theta": 10000.0,
23
+ "tie_word_embeddings": false,
24
+ "torch_dtype": "float16",
25
+ "transformers_version": "4.37.2",
26
+ "use_cache": true,
27
+ "vocab_size": 32000
28
+ }
config.yml ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ slices:
2
+ - sources:
3
+ - model: euryale-LORAxwin-70B
4
+ layer_range: [0, 16]
5
+ - sources:
6
+ - model: euryale-LORAxwin-70B
7
+ layer_range: [8, 24]
8
+ - sources:
9
+ - model: euryale-LORAxwin-70B
10
+ layer_range: [16, 32]
11
+ - sources:
12
+ - model: euryale-LORAxwin-70B
13
+ layer_range: [24, 40]
14
+ - sources:
15
+ - model: euryale-LORAxwin-70B
16
+ layer_range: [32, 48]
17
+ - sources:
18
+ - model: euryale-LORAxwin-70B
19
+ layer_range: [40, 56]
20
+ - sources:
21
+ - model: euryale-LORAxwin-70B
22
+ layer_range: [48, 64]
23
+ - sources:
24
+ - model: euryale-LORAxwin-70B
25
+ layer_range: [56, 72]
26
+ - sources:
27
+ - model: euryale-LORAxwin-70B
28
+ layer_range: [64, 80]
29
+ merge_method: passthrough
30
+ dtype: float16
model.safetensors.index.json ADDED
The diff for this file is too large to render. See raw diff
 
special_tokens_map.json ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<unk>",
4
+ "<s>",
5
+ "</s>"
6
+ ],
7
+ "bos_token": {
8
+ "content": "<s>",
9
+ "lstrip": false,
10
+ "normalized": false,
11
+ "rstrip": false,
12
+ "single_word": false
13
+ },
14
+ "eos_token": {
15
+ "content": "</s>",
16
+ "lstrip": false,
17
+ "normalized": false,
18
+ "rstrip": false,
19
+ "single_word": false
20
+ },
21
+ "unk_token": {
22
+ "content": "<unk>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false
27
+ }
28
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9e556afd44213b6bd1be2b850ebbbd98f5481437a8021afaf58ee7fb1818d347
3
+ size 499723
tokenizer_config.json ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": true,
3
+ "add_eos_token": false,
4
+ "added_tokens_decoder": {
5
+ "0": {
6
+ "content": "<unk>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "1": {
14
+ "content": "<s>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "2": {
22
+ "content": "</s>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ }
29
+ },
30
+ "additional_special_tokens": [
31
+ "<unk>",
32
+ "<s>",
33
+ "</s>"
34
+ ],
35
+ "bos_token": "<s>",
36
+ "clean_up_tokenization_spaces": false,
37
+ "eos_token": "</s>",
38
+ "legacy": false,
39
+ "model_max_length": 1000000000000000019884624838656,
40
+ "pad_token": null,
41
+ "sp_model_kwargs": {},
42
+ "tokenizer_class": "LlamaTokenizer",
43
+ "unk_token": "<unk>",
44
+ "use_default_system_prompt": true
45
+ }