test / model /README.md
zjc664656505's picture
Upload folder using huggingface_hub
c7c793b verified
metadata
library_name: peft
tags:
  - generated_from_trainer
base_model: /GenAI4HW/llama2_13b
metrics:
  - accuracy
model-index:
  - name: >-
      outputs/llama2-13B-lora-QuArch_0_1_1_alpaca_filtered-answer-context-test-new
    results: []

Built with Axolotl

See axolotl config

axolotl version: 0.4.0

## General
# base_model: meta-llama/Meta-Llama-3-8B-Instruct
base_model: /GenAI4HW/llama2_13b
# base_model: meta-llama/Llama-2-13b
model_type: LlamaForCausalLM
tokenizer_type: AutoTokenizer
# tokenizer_type: LlamaTokenizer
output_dir: ./outputs/llama2-13B-lora-QuArch_0_1_1_alpaca_filtered-answer-context-test-new
seed: 42

## Data Configuration
datasets:
  # - path: ./data/QuArch_v0_1_0_alpaca_w_context.json             # With abstract
  # - path: ./data/QuArch_v0_1_1_alpaca_format.json                # With justification
  # - path: ./data/QuArch_v0_1_0_alpaca_mmlu.json                  # Without justification
  - path: ./data/QuArch_v0_1_1_alpaca_filtered_context/ 
    type: alpaca
    data_file: train
    
dataset_prepared_path:

test_datasets:
  - path: ./data/QuArch_v0_1_1_alpaca_filtered_context/
    type: alpaca
    split: test
    data_file: 
        - test  
  
  # - path: ./data/QuArch_v0_1_1_alpaca_filtered_context/
  #   type: alpaca
  #   split: val
  #   data_file: 
  #       - val

## Model Configuration
load_in_8bit: false
load_in_4bit: false
strict: false
bf16: auto
fp16:
tf32: false
device_map: 'auto'  

## LoRA Configuration
adapter: lora
lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_linear: true
lora_model_dir:
lora_fan_in_fan_out:

## Logging Configuration
logging_dir: ./logs
logging_steps: 10
wandb_project: 
wandb_entity: 
wandb_watch: 
wandb_name: 
wandb_log_model: 

do_eval: true
## Training Configuration
sequence_len: 1024
sample_packing: true
pad_to_sequence_len: true
train_on_inputs: false
group_by_length: false
micro_batch_size: 1
gradient_accumulation_steps: 16
num_epochs: 30
warmup_steps: 10 
weight_decay: 0.01
optimizer: adamw_torch
lr_scheduler: linear
learning_rate: 2e-5
gradient_checkpointing: false
saves_per_epoch: 1
# save_steps: 0
# save_strategy: steps
save_total_limit: 30
load_best_model_at_end: true
greater_is_better: true
early_stopping_patience:
resume_from_checkpoint:
remove_unused_columns: true

## Evaluation Configuration
eval_sample_packing: False
eval_batch_size: 1
evals_per_epoch: 1
# evaluation_strategy: epoch
eval_max_new_tokens: 32
eval_table_size:
# max_new_token: 32
# eval_causal_lm_metrics: sacrebleu

# Others
local_rank: 
xformers_attention:
flash_attention: true
s2_attention:
debug:
deepspeed:
fsdp:
fsdp_config:
special_tokens:
    # pad_token: <|end_of_text|>

outputs/llama2-13B-lora-QuArch_0_1_1_alpaca_filtered-answer-context-test-new

This model was trained from scratch on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0432
  • Accuracy: 0.9808

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 32
  • total_eval_batch_size: 2
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 10
  • num_epochs: 30

Training results

Training Loss Epoch Step Validation Loss Accuracy
No log 0.2105 1 5.1322 0.6154
No log 0.8421 4 5.1271 0.6346
No log 1.6842 8 5.0601 0.6538
5.1323 2.5263 12 4.7743 0.7885
5.1323 3.3684 16 4.0491 0.9231
4.2735 4.2105 20 2.6444 0.8846
4.2735 5.0526 24 1.0551 0.9615
4.2735 5.8947 28 0.4698 0.6923
1.2232 6.7368 32 0.3224 0.6731
1.2232 7.5789 36 0.2527 1.0
0.3083 8.4211 40 0.1972 1.0
0.3083 9.2632 44 0.1372 0.9615
0.3083 10.1053 48 0.0803 1.0
0.1761 10.9474 52 0.0575 0.9808
0.1761 11.7895 56 0.0475 0.9808
0.116 12.6316 60 0.0444 0.9808
0.116 13.4737 64 0.0463 0.9808
0.116 14.3158 68 0.0489 0.9808
0.0814 15.1579 72 0.0495 0.9808
0.0814 16.0 76 0.0481 0.9808
0.0709 16.8421 80 0.0469 0.9808
0.0709 17.6842 84 0.0457 0.9808
0.0709 18.5263 88 0.0455 0.9808
0.0632 19.3684 92 0.0454 0.9808
0.0632 20.2105 96 0.0459 0.9808
0.0569 21.0526 100 0.0458 0.9808
0.0569 21.8947 104 0.0446 0.9808
0.0569 22.7368 108 0.0451 0.9808
0.055 23.5789 112 0.0446 0.9808
0.055 24.4211 116 0.0452 0.9808
0.0581 25.2632 120 0.0432 0.9808

Framework versions

  • PEFT 0.10.0
  • Transformers 4.41.2
  • Pytorch 2.1.2+cu121
  • Datasets 2.19.1
  • Tokenizers 0.19.1