brucethemoose/Capybara-Tess-Yi-34B-200K-DARE-Ties-4bpw-exl2-fiction

NousResearch/Nous-Capybara-34B, migtissera/Tess-M-v1.2 and migtissera/Tess-M-v1.3 merged with a new, experimental implementation of "dare ties" via mergekit. See:

Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch

https://github.com/yule-BUAA/MergeLM

https://github.com/cg123/mergekit/tree/dare-tokenizer

It was quantized with exllamav2 on 200 rows (400K tokens) on a long Vicuna format chat, a single sci fi story and a single fantasy story. This should hopefully yield better chat performance than the default wikitext quantization.

Quantized to 4bpw, enough for ~45K context on a 24GB GPU.

Merged with the following config, and the tokenizer from Yi Llamafied:

models:
  - model: /home/alpha/Storage/Models/Raw/larryvrh_Yi-34B-200K-Llamafied
    # no parameters necessary for base model
  - model: /home/alpha/Storage/Models/Raw/migtissera_Tess-M-v1.3
    parameters:
      weight: 0.50
      density: 0.56
  - model: /home/alpha/Storage/Models/Raw/migtissera_Tess-M-v1.2
    parameters:
      weight: 0.20
      density: 0.50
  - model: /home/alpha/Storage/Models/Raw/Nous-Capybara-34B
    parameters:
      weight: 0.50
      density: 0.56
merge_method: dare_ties
base_model: /home/alpha/Storage/Models/Raw/larryvrh_Yi-34B-200K-Llamafied
parameters:
  int8_mask: true
dtype: bfloat16

Tess 1.2 (at a low weight) and 1.3 were used because, according to the trainer, they were trained on different datasets: https://migel.substack.com/p/learnings-from-training-tess

I chose not to include other finetunes, such as Dolphin, because they aren't trained on the 200K base. If any other 200K finetunes pop up, let me know.

First exllama quantization pass, on 80 rows so it will fit in memory:

python convert.py --in_dir /home/alpha/FastModels/Capybara-Tess-34B-200K-DARE -o /home/alpha/FastModels/scratch -om /home/alpha/FastModels/capytess13mes.json --cal_dataset /home/alpha/Documents/smol.parquet -l 2048 -r 80 -ml 2048 -mr 40 -gr 40 -ss 4096 -nr -b 4.0 -hb 6

Second exllama quantization pass. 200 rows:

python convert.py --in_dir /home/alpha/FastModels/Capybara-Tess-34B-200K-DARE -o /home/alpha/FastModels/scratch -m /home/alpha/FastModels/capytess13mes.json --cal_dataset /home/alpha/Documents/medium.parquet -l 2048 -r 200 -ml 2048 -mr 40 -gr 200 -ss 4096 -b 4.0 -hb 6 -cf /home/alpha/FastModels/Capybara-Tess-34B-200K-DARE-exl2-4bpw-fiction -nr

Prompt template: Orca-Vicuna

SYSTEM: {system_message}
USER: {prompt}
ASSISTANT:

Being a Yi model, try disabling the BOS token and/or running a lower temperature with MinP if output doesn't seem right.

Sometimes the model "spells out" the stop token as </s> like Capybara, so you may need to add </s> as an additional stopping condition.

Credits:

https://github.com/cg123/mergekit/tree/dare-tokenizer

https://huggingface.co/NousResearch/Nous-Capybara-34B/

https://huggingface.co/migtissera/Tess-M-v1.2

https://huggingface.co/migtissera/Tess-M-v1.3

https://huggingface.co/larryvrh/Yi-34B-200K-Llamafied

https://huggingface.co/01-ai/Yi-34B-200K