brucethemoose commited on
Commit
f4420b5
1 Parent(s): 7be3546

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -60
README.md CHANGED
@@ -10,14 +10,11 @@ tags:
10
  - text-generation-inference
11
  ---
12
 
13
- **Dolphin-2.2-yi-34b-200k**, **Nous-Capybara-34B**, **Tess-M-v1.4**, **Airoboros-3_1-yi-34b-200k**, **PlatYi-34B-200K-Q**, and **Una-xaberius-34b-v1beta** merged with a new, experimental implementation of "dare ties" via mergekit. See:
14
 
15
- > [Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch](https://github.com/yule-BUAA/MergeLM)
16
 
17
- > https://github.com/cg123/mergekit/tree/dare
18
-
19
-
20
- Merged with the following config, and the tokenizer from chargoddard's Yi-Llama:
21
  ```
22
  models:
23
  - model: /home/alpha/Storage/Models/Raw/chargoddard_Yi-34B-200K-Llama
@@ -54,57 +51,3 @@ int8_mask: true
54
  dtype: bfloat16
55
 
56
  ```
57
- ## Testing Notes
58
-
59
- Various densities were tested with perplexity tests and long context prompts. Relatively high densities seem to perform better, contrary to the findings of the Super Mario paper.
60
-
61
- Weights that add up to 1 seems to be optimal.
62
-
63
- Dare Ties is also resulting in seemingly better, lower perplexity merges than a regular ties merge, task arithmetic or a slerp merge.
64
-
65
- Xaberuis is not a 200K model, hence it was merged at a very low density to try and preserve Yi 200K's long context performance while still inheriting some of Xaberius's performance.
66
-
67
- I chose not to include other finetunes because they aren't trained on the 200K base. If any other 200K finetunes pop up, let me know.
68
-
69
- ***
70
- ## Prompt template: Orca-Vicuna?
71
- ```
72
- SYSTEM: {system_message}
73
- USER: {prompt}
74
- ASSISTANT:
75
- ```
76
- It might recognize ChatML from Dolphin+Xaberius, and Llama-chat from Airoboros.
77
-
78
- Sometimes the model "spells out" the stop token as `</s>` like Capybara, so you may need to add `</s>` as an additional stopping condition.
79
-
80
- ***
81
- ## Running
82
- Being a Yi model, try disabling the BOS token and/or running a lower temperature with 0.05-0.13 MinP, a little repetition penalty, and no other samplers. Yi tends to run "hot" by default.
83
-
84
- 24GB GPUs can run Yi-34B-200K models at **45K-75K context** with exllamav2. I go into more detail in this [post](https://old.reddit.com/r/LocalLLaMA/comments/1896igc/how_i_run_34b_models_at_75k_context_on_24gb_fast/)
85
-
86
- I recommend exl2 quantizations profiled on data similar to the desired task. It is especially sensitive to the quantization data at low bpw!
87
-
88
- To load this in full-context backends like transformers and vllm, you *must* change `max_position_embeddings` in config.json to a lower value than 200,000, otherwise you will OOM!
89
-
90
- ***
91
-
92
- ## Credits:
93
-
94
- https://github.com/cg123/mergekit/tree/dare
95
-
96
- https://huggingface.co/ehartford/dolphin-2.2-yi-34b-200k
97
-
98
- https://huggingface.co/kyujinpy/PlatYi-34B-200K-Q
99
-
100
- https://huggingface.co/NousResearch/Nous-Capybara-34B/
101
-
102
- https://huggingface.co/bhenrym14/airoboros-3_1-yi-34b-200k
103
-
104
- https://huggingface.co/migtissera/Tess-M-v1.4
105
-
106
- https://huggingface.co/fblgit/una-xaberius-34b-v1beta
107
-
108
- https://huggingface.co/chargoddard/Yi-34B-200K-Llama
109
-
110
- https://huggingface.co/01-ai/Yi-34B-200K
 
10
  - text-generation-inference
11
  ---
12
 
13
+ A low density DARE, for benchmarking on the open llm leaderboard.
14
 
15
+ You probably shouldn't use this model. Use this higher density one instead, which is scoring much better in metrics and perplexity tests: https://huggingface.co/brucethemoose/CaPlatTessDolXaBoros-Yi-34B-200K-DARE-Ties-HighDensity
16
 
17
+ mergekit config:
 
 
 
18
  ```
19
  models:
20
  - model: /home/alpha/Storage/Models/Raw/chargoddard_Yi-34B-200K-Llama
 
51
  dtype: bfloat16
52
 
53
  ```