llama-3-20b-instruct

This is an experimental dense merge of two Meta-Llama-3-8B's and one Meta-Llama-3-8B-Instruct, created using mergekit. This is meant to be an experiment and should be treated as such. The question is whether the instruct model would be able to tap into the knowledge of the other two base models and provide a better model.

UPDATE:

They do not, having the base model for the first layers might have been a mistake, will try instruct + base + instruct to see if that works better.

License

The Meta-Llama-3-8B license applies to this model as well. The Meta-Llama-3-8B license can be found here.

Merge Details

Merge Method

This model was merged using the passthrough merge method. (I couldn't figure out how to use the other methods yet)

Models Merged

The following models were included in the merge:

Configuration

The following YAML configuration was used to produce this model:

slices:
  - sources:
    - model: meta-llama/Meta-Llama-3-8B
      layer_range: [0, 24]
  - sources:
    - model: meta-llama/Meta-Llama-3-8B
      layer_range: [8, 32]
  - sources:
    - model: meta-llama/Meta-Llama-3-8B-Instruct
      layer_range: [0, 32]
merge_method: passthrough
dtype: bfloat16
tokenizer_source: union

AtakanTekparmak
/

llama-3-20b-instruct

llama-3-20b-instruct

UPDATE:

License

Merge Details

Merge Method

Models Merged

Configuration

Merge of

Collection including AtakanTekparmak/llama-3-20b-instruct

Llama 3 Experiments

llama-3-20b-instruct

UPDATE:

License

Merge Details

Merge Method

Models Merged

Configuration

Merge of meta-llama/Meta-Llama-3-8B-Instruct meta-llama/Meta-Llama-3-8B

Collection including AtakanTekparmak/llama-3-20b-instruct

Merge of