brucethemoose
/

CaPlatTessDolXaBoros-Yi-34B-200K-DARE-Ties

@@ -10,14 +10,11 @@ tags:
 - text-generation-inference
 ---
-**Dolphin-2.2-yi-34b-200k**, **Nous-Capybara-34B**, **Tess-M-v1.4**, **Airoboros-3_1-yi-34b-200k**, **PlatYi-34B-200K-Q**, and **Una-xaberius-34b-v1beta** merged with a new, experimental implementation of "dare ties" via mergekit. See:
-> [Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch](https://github.com/yule-BUAA/MergeLM)
-> https://github.com/cg123/mergekit/tree/dare
-Merged with the following config, and the tokenizer from chargoddard's Yi-Llama:
 ```
 models:
   - model: /home/alpha/Storage/Models/Raw/chargoddard_Yi-34B-200K-Llama
@@ -54,57 +51,3 @@ int8_mask: true
 dtype: bfloat16
 ```
-## Testing Notes
-Various densities were tested with perplexity tests and long context prompts. Relatively high densities seem to perform better, contrary to the findings of the Super Mario paper.
-Weights that add up to 1 seems to be optimal.
-Dare Ties is also resulting in seemingly better, lower perplexity merges than a regular ties merge, task arithmetic or a slerp merge.
-Xaberuis is not a 200K model, hence it was merged at a very low density to try and preserve Yi 200K's long context performance while still inheriting some of Xaberius's performance.
-I chose not to include other finetunes because they aren't trained on the 200K base. If any other 200K finetunes pop up, let me know.
-***
-## Prompt template: Orca-Vicuna?
-```
-SYSTEM: {system_message}
-USER: {prompt}
-ASSISTANT:
-```
-It might recognize ChatML from Dolphin+Xaberius, and Llama-chat from Airoboros.
-Sometimes the model "spells out" the stop token as `</s>` like Capybara, so you may need to add `</s>` as an additional stopping condition.
-***
-## Running
-Being a Yi model, try disabling the BOS token and/or running a lower temperature with 0.05-0.13 MinP, a little repetition penalty, and no other samplers. Yi tends to run "hot" by default.
-24GB GPUs can run Yi-34B-200K models at **45K-75K context** with exllamav2. I go into more detail in this [post](https://old.reddit.com/r/LocalLLaMA/comments/1896igc/how_i_run_34b_models_at_75k_context_on_24gb_fast/)
-I recommend exl2 quantizations profiled on data similar to the desired task. It is especially sensitive to the quantization data at low bpw!
-To load this in full-context backends like transformers and vllm, you *must* change `max_position_embeddings` in config.json to a lower value than 200,000, otherwise you will OOM!
-***
-## Credits:
-https://github.com/cg123/mergekit/tree/dare
-https://huggingface.co/ehartford/dolphin-2.2-yi-34b-200k
-https://huggingface.co/kyujinpy/PlatYi-34B-200K-Q
-https://huggingface.co/NousResearch/Nous-Capybara-34B/
-https://huggingface.co/bhenrym14/airoboros-3_1-yi-34b-200k
-https://huggingface.co/migtissera/Tess-M-v1.4
-https://huggingface.co/fblgit/una-xaberius-34b-v1beta
-https://huggingface.co/chargoddard/Yi-34B-200K-Llama
-https://huggingface.co/01-ai/Yi-34B-200K

 - text-generation-inference
 ---
+A low density DARE, for benchmarking on the open llm leaderboard.
+You probably shouldn't use this model. Use this higher density one instead, which is scoring much better in metrics and perplexity tests: https://huggingface.co/brucethemoose/CaPlatTessDolXaBoros-Yi-34B-200K-DARE-Ties-HighDensity
+mergekit config:
 ```
 models:
   - model: /home/alpha/Storage/Models/Raw/chargoddard_Yi-34B-200K-Llama
 dtype: bfloat16
 ```