athirdpath
/

CleverMage-Mistral-13b-DARE_blended-FAILURE

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Oh no, he's dumb too! I have a working hypothesis. Inverting and merging 20b Llama 2 models works quite well, evening out the gradients between slices. However, these 13b Mistrals seem to HATE it, I assume due to the unbalanced nature of my recipe. More study is required.

Recipe

merge_method: dare_ties

base_model: athirdpath/BigMistral-13b
model: athirdpath/CleverMage-Mistral-13b

weight: 0.60 / density: 0.35
model: athirdpath/CleverMage-Mistral-13b-INV

weight: 0.40 / density: 0.30

int8_mask: true

dtype: bfloat16

Downloads last month: 10

Safetensors

Model size

13.3B params

Tensor type

BF16

·

Inference Providers NEW

Text Generation

This model is not currently available via any of the supported Inference Providers.

Collection including athirdpath/CleverMage-Mistral-13b-DARE_blended-FAILURE

FailLabs

WITH EXPLANATIONS - Total failures and dead-ends. Learn from my mistakes. • 5 items • Updated Dec 2, 2023