llama-3-20b-instruct
This is an experimental dense merge of two Meta-Llama-3-8B's and one Meta-Llama-3-8B-Instruct, created using mergekit. This is meant to be an experiment and should be treated as such. The question is whether the instruct model would be able to tap into the knowledge of the other two base models and provide a better model.
UPDATE:
They do not, having the base model for the first layers might have been a mistake, will try instruct + base + instruct to see if that works better.
License
The Meta-Llama-3-8B license applies to this model as well. The Meta-Llama-3-8B license can be found here.
Merge Details
Merge Method
This model was merged using the passthrough merge method. (I couldn't figure out how to use the other methods yet)
Models Merged
The following models were included in the merge:
Configuration
The following YAML configuration was used to produce this model:
slices:
- sources:
- model: meta-llama/Meta-Llama-3-8B
layer_range: [0, 24]
- sources:
- model: meta-llama/Meta-Llama-3-8B
layer_range: [8, 32]
- sources:
- model: meta-llama/Meta-Llama-3-8B-Instruct
layer_range: [0, 32]
merge_method: passthrough
dtype: bfloat16
tokenizer_source: union
- Downloads last month
- 0