Model Stock: All we need is just a few fine-tuned models
Paper
β’
2403.19522
β’
Published
β’
13
The difference with normal quantizations is that I quantize the output and embed tensors to f16. and the other tensors to 15_k,q6_k or q8_0. This creates models that are little or not degraded at all and have a smaller size. They run at about 3-6 t/sec on CPU only using llama.cpp And obviously faster on computers with potent GPUs
The module combination has been readjusted to better fulfill various roles and has been adapted for mobile phones.

stop = [
"## Instruction:",
"### Instruction:",
"<|end_of_text|>",
" //:",
"</s>",
"<3```",
"### Note:",
"### Input:",
"### Response:",
"### Emoticons:"
],
This is a merge of pre-trained language models created using mergekit.
This model was merged using the Model Stock merge method using ./llama3-8B-DarkIdol-2.3b as a base.
The following YAML configuration was used to produce this model:
models:
- model: Sao10K/L3-8B-Niitama-v1
- model: Hastagaras/Jamet-8B-L3-MK.V-Blackroot
- model: Nitral-AI/Hathor_Tahsin-L3-8B-v0.85
- model: turboderp/llama3-turbcat-instruct-8b
- model: winglian/Llama-3-8b-64k-PoSE
merge_method: model_stock
base_model: winglian/Llama-3-8b-64k-PoSE
dtype: bfloat16
models:
- model: maldv/badger-writer-llama-3-8b
- model: underwoods/writer-8b
- model: Gryphe/Pantheon-RP-1.0-8b-Llama-3
- model: vicgalle/Roleplay-Llama-3-8B
- model: cgato/TheSalt-RP-L3-8b-DPO-v0.3.2-e0.15.2
- model: ./llama3-8B-DarkIdol-2.3a
merge_method: model_stock
base_model: ./llama3-8B-DarkIdol-2.3a
dtype: bfloat16
models:
- model: Rupesh2/Meta-Llama-3-8B-abliterated
- model: Orenguteng/Llama-3-8B-LexiFun-Uncensored-V1
- model: Orenguteng/Llama-3-8B-Lexi-Uncensored
- model: theprint/Llama-3-8B-Lexi-Smaug-Uncensored
- model: vicgalle/Unsafe-Llama-3-8B
- model: vicgalle/Configurable-Hermes-2-Pro-Llama-3-8B
- model: ./llama3-8B-DarkIdol-2.3b
merge_method: model_stock
base_model: ./llama3-8B-DarkIdol-2.3b
dtype: bfloat16