Edit model card

llama-3-20b-instruct

This is an experimental dense merge of two Meta-Llama-3-8B's and one Meta-Llama-3-8B-Instruct, created using mergekit. This is meant to be an experiment and should be treated as such. The question is whether the instruct model would be able to tap into the knowledge of the other two base models and provide a better model.

UPDATE:

They do not, having the base model for the first layers might have been a mistake, will try instruct + base + instruct to see if that works better.

License

The Meta-Llama-3-8B license applies to this model as well. The Meta-Llama-3-8B license can be found here.

Merge Details

Merge Method

This model was merged using the passthrough merge method. (I couldn't figure out how to use the other methods yet)

Models Merged

The following models were included in the merge:

Configuration

The following YAML configuration was used to produce this model:

slices:
  - sources:
    - model: meta-llama/Meta-Llama-3-8B
      layer_range: [0, 24]
  - sources:
    - model: meta-llama/Meta-Llama-3-8B
      layer_range: [8, 32]
  - sources:
    - model: meta-llama/Meta-Llama-3-8B-Instruct
      layer_range: [0, 32]
merge_method: passthrough
dtype: bfloat16
tokenizer_source: union
Downloads last month
0
Safetensors
Model size
18.5B params
Tensor type
BF16
·

Merge of

Collection including AtakanTekparmak/llama-3-20b-instruct