Upload 8 files

Browse files

Files changed (8) hide show

README.md +43 -3
config.json +28 -0
config.yml +30 -0
model.safetensors.index.json +0 -0
special_tokens_map.json +28 -0
tokenizer.json +0 -0
tokenizer.model +3 -0
tokenizer_config.json +45 -0

README.md CHANGED Viewed

@@ -1,3 +1,43 @@
----
-license: llama2
----

+# BETTER THAN GOLIATH?!
+I've merged [Xwin-lora that I made](https://huggingface.co/ChuckMcSneed/Xwin-LM-70B-V0.1-LORA) with [Euryale](https://huggingface.co/Sao10K/Euryale-1.3-L2-70B) and then merged it with itself in [goliath-style merge](/config.yml) using [mergekit](https://github.com/arcee-ai/mergekit). The resulting model performs better than [goliath](https://huggingface.co/alpindale/goliath-120b) on my tests(note: performance on tests is not necessarily performance in practice). Test it, have fun with it. I'll upload Xwin-LORAEuryale selfmerge next.
+# Ideas behind it
+Since the creation of Goliath I was wondering if it was possible to make something even better. I've tried linear, passthrough, SLERP, TIES-merging models, but I could not recreate the greatness of goliath, at least not in a way that I liked in practical use. I knew about the existence of LORAs but I didn't know how well they performed. I created a model named [Gembo](https://huggingface.co/ChuckMcSneed/Gembo-v1-70b) by merging a shitton of LORAs together, and surprisingly it worked! In fact it worked so well that it was the best model on my benchmarks until now. When I found a tool named [LORD](https://github.com/thomasgauthier/LoRD), which can extract LORA from any model, I knew I could do something even better.
+I've extracted LORA from Euryale, then from Xwin and began testing. Merging Euryale-lora to Xwin and the other way around, created better models, which outperformed their parents:
+|Name                                 |Quant|Size|B  |C  |D  |S  |P   |total|BCD|SP   |
+|-------------------------------------|-----|----|---|---|---|---|----|-----|---|-----|
+|Sao10K/Euryale-1.3-L2-70B            |Q6_K |70B |0  |2  |0  |3  |5   |10   |2  |8    |
+|Sao10K/Euryale-1.3-L2-70B+xwin-lora  |Q6_K |70B |2  |2  |1  |5.5|5.5 |16   |5  |11   |
+|Xwin-LM/Xwin-LM-70B-V0.1             |Q6_K |70B |0  |1  |2  |5.5|5.25|13.75|3  |10.75|
+|Xwin-LM/Xwin-LM-70B-V0.1+euryale-lora|Q6_K |70B |3  |2  |2  |6  |5   |18   |7  |11   |
+Results seemed promising, so I continued testing, merging it in goliath-like way in different orders(EX=Euryale+LORAXwin; XE=Xwin+LORAEuryale). The results were even more surprising:
+|Name                                 |Quant|Size|B  |C  |D  |S  |P   |total|BCD|SP   |
+|-------------------------------------|-----|----|---|---|---|---|----|-----|---|-----|
+|alpindale/goliath-120b               |Q6_K |120B|3  |2  |1  |6  |6   |18   |6  |12   |
+|ChuckMcSneed/Premerge-EX-EX-123B(this model)     |Q6_K |123B|2  |2  |1.5|7.25|6   |18.75|5.5|13.25|
+|ChuckMcSneed/Premerge-EX-XE-123B     |Q6_K |123B|2  |2  |2  |5.75|6   |17.75|6  |11.75|
+|ChuckMcSneed/Premerge-XE-EX-123B     |Q6_K |123B|2  |2  |2.5|6.75|5.5 |18.75|6.5|12.25|
+|ChuckMcSneed/Premerge-XE-XE-123B     |Q6_K |123B|3  |2  |2.5|7.25|5.25|20   |7.5|12.5 |
+|Sao10K/Euryale-1.3-L2-70B+xwin-lora  |Q6_K |70B |2  |2  |1  |5.5|5.5 |16   |5  |11   |
+|Xwin-LM/Xwin-LM-70B-V0.1+euryale-lora|Q6_K |70B |3  |2  |2  |6  |5   |18   |7  |11   |
+Contrary to my expectations, merging two different models was suboptimal in this case. Selfmerge of Euryale-LORAXwin(this model) did beat all of the other merges on SP tests(creative writing), making it the highest scoring model on those tests that I've tested so far, and selfmerge of Xwin-LORAEuryale had highest score overall.
+# What it means
+Potentially in the future we can get better models by controlled merging of LORAs.
+# Benchmarks
+### NeoEvalPlusN
+[My meme benchmark.](https://huggingface.co/datasets/ChuckMcSneed/NeoEvalPlusN_benchmark)
+|Name                                 |Quant|Size|B  |C  |D  |S  |P   |total|BCD|SP   |
+|-------------------------------------|-----|----|---|---|---|---|----|-----|---|-----|
+|alpindale/goliath-120b               |Q6_K |120B|3  |2  |1  |6  |6   |18   |6  |12   |
+|ChuckMcSneed/Premerge-EX-EX-123B(this model)     |Q6_K |123B|2  |2  |1.5|7.25|6   |18.75|5.5|13.25|
+|ChuckMcSneed/Premerge-EX-XE-123B     |Q6_K |123B|2  |2  |2  |5.75|6   |17.75|6  |11.75|
+|ChuckMcSneed/Premerge-XE-EX-123B     |Q6_K |123B|2  |2  |2.5|6.75|5.5 |18.75|6.5|12.25|
+|ChuckMcSneed/Premerge-XE-XE-123B     |Q6_K |123B|3  |2  |2.5|7.25|5.25|20   |7.5|12.5 |
+|Sao10K/Euryale-1.3-L2-70B            |Q6_K |70B |0  |2  |0  |3  |5   |10   |2  |8    |
+|Sao10K/Euryale-1.3-L2-70B+xwin-lora  |Q6_K |70B |2  |2  |1  |5.5|5.5 |16   |5  |11   |
+|Xwin-LM/Xwin-LM-70B-V0.1             |Q6_K |70B |0  |1  |2  |5.5|5.25|13.75|3  |10.75|
+|Xwin-LM/Xwin-LM-70B-V0.1+euryale-lora|Q6_K |70B |3  |2  |2  |6  |5   |18   |7  |11   |

config.json ADDED Viewed

	@@ -0,0 +1,28 @@

+{
+  "_name_or_path": "/euryale-LORAxwin-70B",
+  "architectures": [
+    "LlamaForCausalLM"
+  ],
+  "attention_bias": false,
+  "attention_dropout": 0.0,
+  "bos_token_id": 1,
+  "eos_token_id": 2,
+  "hidden_act": "silu",
+  "hidden_size": 8192,
+  "initializer_range": 0.02,
+  "intermediate_size": 28672,
+  "max_position_embeddings": 4096,
+  "model_type": "llama",
+  "num_attention_heads": 64,
+  "num_hidden_layers": 144,
+  "num_key_value_heads": 8,
+  "pretraining_tp": 1,
+  "rms_norm_eps": 1e-05,
+  "rope_scaling": null,
+  "rope_theta": 10000.0,
+  "tie_word_embeddings": false,
+  "torch_dtype": "float16",
+  "transformers_version": "4.37.2",
+  "use_cache": true,
+  "vocab_size": 32000
+}

config.yml ADDED Viewed

	@@ -0,0 +1,30 @@

+slices:
+  - sources:
+    - model: euryale-LORAxwin-70B
+      layer_range: [0, 16]
+  - sources:
+    - model: euryale-LORAxwin-70B
+      layer_range: [8, 24]
+  - sources:
+    - model: euryale-LORAxwin-70B
+      layer_range: [16, 32]
+  - sources:
+    - model: euryale-LORAxwin-70B
+      layer_range: [24, 40]
+  - sources:
+    - model: euryale-LORAxwin-70B
+      layer_range: [32, 48]
+  - sources:
+    - model: euryale-LORAxwin-70B
+      layer_range: [40, 56]
+  - sources:
+    - model: euryale-LORAxwin-70B
+      layer_range: [48, 64]
+  - sources:
+    - model: euryale-LORAxwin-70B
+      layer_range: [56, 72]
+  - sources:
+    - model: euryale-LORAxwin-70B
+      layer_range: [64, 80]
+merge_method: passthrough
+dtype: float16

model.safetensors.index.json ADDED Viewed

The diff for this file is too large to render. See raw diff

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,28 @@

+{
+  "additional_special_tokens": [
+    "<unk>",
+    "<s>",
+    "</s>"
+  ],
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9e556afd44213b6bd1be2b850ebbbd98f5481437a8021afaf58ee7fb1818d347
+size 499723

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,45 @@

+{
+  "add_bos_token": true,
+  "add_eos_token": false,
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "additional_special_tokens": [
+    "<unk>",
+    "<s>",
+    "</s>"
+  ],
+  "bos_token": "<s>",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "</s>",
+  "legacy": false,
+  "model_max_length": 1000000000000000019884624838656,
+  "pad_token": null,
+  "sp_model_kwargs": {},
+  "tokenizer_class": "LlamaTokenizer",
+  "unk_token": "<unk>",
+  "use_default_system_prompt": true
+}