|
--- |
|
license: other |
|
license_name: llama-3 |
|
license_link: https://llama.meta.com/llama3/license/ |
|
tags: |
|
- llama-3 |
|
- llama |
|
- '3' |
|
- 5B |
|
--- |
|
This is just an experiment similar to that done on [chargoddard/llama3-42b-v0](https://huggingface.co/chargoddard/llama3-42b-v0). The post-pruning was fine-tuned or "healed" with QLoRA using the code DPO dataset [AlekseyKorshuk/evol-codealpaca-v1-dpo](https://huggingface.co/datasets/AlekseyKorshuk/evol-codealpaca-v1-dpo). Due to limitations, this was only trained on 3150/4935 (~64%) steps of the data. I had to restart the training about halfway through, so the logs are split in two. I am still unsure if the tokenizer is correct. |
|
|
|
Loss: ~1.2 |
|
<img src="https://i.imgur.com/AnuMlv7.png"> |
|
|
|
<img src="https://i.imgur.com/kHXnKCU.png"> |
|
|
|
<img src="https://i.imgur.com/aHKVgqT.png"> |
|
|
|
<img src="https://i.imgur.com/KTLYnjl.png"> |
|
|
|
mergekit.yaml |
|
``` |
|
slices: |
|
- sources: |
|
- model: ./Meta-Llama-3-8B-Instruct/ |
|
layer_range: [0,15] |
|
- sources: |
|
- model: ./Meta-Llama-3-8B-Instruct/ |
|
layer_range: [29,32] |
|
|
|
merge_method: passthrough |
|
dtype: bfloat16 |
|
``` |
|
|
|
ORPOConfig |
|
``` |
|
learning_rate=5e-5, |
|
lr_scheduler_type="cosine", |
|
max_length=1024, |
|
max_prompt_length=512, |
|
overwrite_output_dir=False, |
|
beta=0.1, |
|
per_device_train_batch_size=2, |
|
per_device_eval_batch_size=2, |
|
gradient_accumulation_steps=4, |
|
optim="paged_adamw_8bit", |
|
num_train_epochs=1, |
|
evaluation_strategy="steps", |
|
eval_steps=0.02, |
|
logging_steps=1, |
|
warmup_steps=50, |
|
report_to="wandb", |
|
output_dir=out_dir_folder, |
|
fp16=True, |
|
save_steps=50 |
|
``` |