jrahn's picture
Adding Evaluation Results (#1)
2edff46 verified
---
language:
- en
license: llama3
library_name: peft
tags:
- generated_from_trainer
base_model: meta-llama/Meta-Llama-3-8B-Instruct
datasets:
- Norquinal/claude_multi_instruct_30k
model-index:
- name: llama-3-8b-claudstruct-v3
results: []
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
<details><summary>See axolotl config</summary>
axolotl version: `0.4.1`
```yaml
base_model: meta-llama/Meta-Llama-3-8B-Instruct
model_type: LlamaForCausalLM
tokenizer_type: AutoTokenizer # PreTrainedTokenizerFast
load_in_8bit: false
load_in_4bit: true
strict: false
chat_template: llama3
datasets:
- path: Norquinal/claude_multi_instruct_30k
type: alpaca
dataset_prepared_path: last_run_prepared
val_set_size: 0.05
output_dir: ./outputs/llama-3-8b-claudstruct-v3/
adapter: qlora
lora_model_dir:
sequence_len: 512
sample_packing: false
pad_to_sequence_len: true
lora_r: 8
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules:
lora_target_linear: true
lora_fan_in_fan_out:
wandb_project:
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:
gradient_accumulation_steps: 1
micro_batch_size: 8
num_epochs: 2
optimizer: adamw_torch
lr_scheduler: cosine
learning_rate: 0.00001
train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false
gradient_checkpointing: true
gradient_checkpointing_kwargs:
use_reentrant: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true
warmup_steps: 10
evals_per_epoch: 4
eval_table_size:
saves_per_epoch: 1
debug:
deepspeed:
weight_decay: 0.0
fsdp:
- full_shard
- auto_wrap
fsdp_config:
fsdp_limit_all_gathers: true
fsdp_sync_module_states: true
fsdp_offload_params: true
fsdp_use_orig_params: false
fsdp_cpu_ram_efficient_loading: true
fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
fsdp_transformer_layer_cls_to_wrap: LlamaDecoderLayer
fsdp_state_dict_type: FULL_STATE_DICT
fsdp_sharding_strategy: FULL_SHARD
special_tokens:
pad_token: <|end_of_text|>
```
</details><br>
# llama-3-8b-claudstruct-v3
This model is a fine-tuned version of [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) on the [Norquinal/claude_multi_instruct_30k](https://huggingface.co/datasets/Norquinal/claude_multi_instruct_30k) dataset.
It achieves the following results on the evaluation set:
- Loss: 1.6226
## Model description
More information needed
## Intended uses & limitations
More information needed
## Training and evaluation data
More information needed
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 2
- total_train_batch_size: 16
- total_eval_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 10
- num_epochs: 2
### Training results
| Training Loss | Epoch | Step | Validation Loss |
|:-------------:|:------:|:----:|:---------------:|
| 2.2209 | 0.0007 | 1 | 2.0399 |
| 1.7842 | 0.2502 | 341 | 1.6960 |
| 1.6914 | 0.5004 | 682 | 1.6590 |
| 1.6757 | 0.7506 | 1023 | 1.6414 |
| 1.5182 | 1.0007 | 1364 | 1.6319 |
| 1.8421 | 1.2509 | 1705 | 1.6264 |
| 1.7271 | 1.5011 | 2046 | 1.6237 |
| 1.4817 | 1.7513 | 2387 | 1.6226 |
### Framework versions
- PEFT 0.11.1
- Transformers 4.41.1
- Pytorch 2.3.0
- Datasets 2.19.1
- Tokenizers 0.19.1
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_jrahn__llama-3-8b-claudstruct-v3)
| Metric |Value|
|---------------------------------|----:|
|Avg. |65.62|
|AI2 Reasoning Challenge (25-Shot)|58.96|
|HellaSwag (10-Shot) |80.05|
|MMLU (5-Shot) |64.55|
|TruthfulQA (0-shot) |51.76|
|Winogrande (5-shot) |74.19|
|GSM8k (5-shot) |64.22|