See axolotl config
axolotl version: 0.7.0
base_model: NousResearch/Meta-Llama-3.1-8B-Instruct
model_type: LlamaForCausalLM
tokenizer_type: AutoTokenizer
hub_model_id: bkciccar/llama-3.1-8b-instruct-culture-lora
load_in_8bit: false
load_in_4bit: false
strict: false
datasets:
- path: json
data_files: culture_shuffle.jsonl
split: train
type: alpaca
test_datasets:
- path: json
data_files:
- culturalbench-hard-preprocessed.jsonl
split: train
type:
system_prompt: Below is a question with a potential answer. Please respond with only 'true' or 'false'.
field_system:
field_instruction: prompt_question
field_input: prompt_option
field_output: answer
format: |-
### Question:
{instruction}
### Option:
{input}
### Response (true or false):
dataset_prepared_path:
output_dir: /scratch/bkciccar/outputs/llama-3.1-8b-lora
sequence_len: 4096
sample_packing: true
eval_sample_packing: true
pad_to_sequence_len: true
adapter: lora
lora_model_dir:
lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out: false
lora_modules_to_save:
- embed_tokens
- lm_head
wandb_project: "CultureBank_FineTuning"
wandb_entity: "bcicc"
wandb_watch:
wandb_name: "Lora_FineTuning_Run_02"
wandb_log_model:
gradient_accumulation_steps: 4
micro_batch_size: 2
eval_batch_size: 10
num_epochs: 4
optimizer: paged_adamw_8bit
lr_scheduler: cosine
learning_rate: 2e-5
train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false
gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true
s2_attention:
warmup_steps: 10
eval_table_size:
eval_max_new_tokens: 5
evals_per_epoch: 16
saves_per_epoch: 4
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
pad_token: <|end_of_text|>
overrides_of_trainer_kwargs:
compute_metrics: "custom_metrics.compute_metrics"
llama-3.1-8b-instruct-culture-lora
This model is a fine-tuned version of NousResearch/Meta-Llama-3.1-8B-Instruct on the json dataset. It achieves the following results on the evaluation set:
- Loss: 2.5461
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 2
- eval_batch_size: 10
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 8
- optimizer: Use OptimizerNames.PAGED_ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 10
- num_epochs: 4.0
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
1.9524 | 0.0031 | 1 | 4.8913 |
1.7809 | 0.0652 | 21 | 3.4064 |
1.546 | 0.1303 | 42 | 2.4681 |
1.4622 | 0.1955 | 63 | 2.7218 |
1.4921 | 0.2607 | 84 | 2.9218 |
1.3685 | 0.3258 | 105 | 2.9830 |
1.3769 | 0.3910 | 126 | 3.0171 |
1.3545 | 0.4562 | 147 | 2.9413 |
1.3076 | 0.5213 | 168 | 2.9677 |
1.348 | 0.5865 | 189 | 2.9398 |
1.3773 | 0.6517 | 210 | 2.7952 |
1.3463 | 0.7168 | 231 | 2.7154 |
1.2929 | 0.7820 | 252 | 2.7142 |
1.3266 | 0.8472 | 273 | 2.7186 |
1.283 | 0.9123 | 294 | 2.7368 |
1.3002 | 0.9775 | 315 | 2.6642 |
1.3143 | 1.0403 | 336 | 2.6567 |
1.3153 | 1.1055 | 357 | 2.6529 |
1.25 | 1.1707 | 378 | 2.5628 |
1.2879 | 1.2358 | 399 | 2.5440 |
1.2793 | 1.3010 | 420 | 2.5021 |
1.2602 | 1.3662 | 441 | 2.6023 |
1.2722 | 1.4313 | 462 | 2.5679 |
1.231 | 1.4965 | 483 | 2.5696 |
1.2678 | 1.5617 | 504 | 2.6337 |
1.2661 | 1.6268 | 525 | 2.5937 |
1.2665 | 1.6920 | 546 | 2.5784 |
1.2655 | 1.7572 | 567 | 2.5441 |
1.3415 | 1.8223 | 588 | 2.5772 |
1.2492 | 1.8875 | 609 | 2.5519 |
1.2046 | 1.9527 | 630 | 2.5300 |
1.2731 | 2.0155 | 651 | 2.5886 |
1.2637 | 2.0807 | 672 | 2.5399 |
1.2628 | 2.1458 | 693 | 2.5214 |
1.2477 | 2.2110 | 714 | 2.5174 |
1.24 | 2.2762 | 735 | 2.5024 |
1.2603 | 2.3413 | 756 | 2.5612 |
1.2505 | 2.4065 | 777 | 2.5594 |
1.2459 | 2.4717 | 798 | 2.5561 |
1.2634 | 2.5369 | 819 | 2.4952 |
1.2029 | 2.6020 | 840 | 2.5080 |
1.2593 | 2.6672 | 861 | 2.5153 |
1.158 | 2.7324 | 882 | 2.5123 |
1.2832 | 2.7975 | 903 | 2.5380 |
1.2801 | 2.8627 | 924 | 2.5191 |
1.1838 | 2.9279 | 945 | 2.5267 |
1.2102 | 2.9930 | 966 | 2.5323 |
1.2958 | 3.0559 | 987 | 2.5298 |
1.2847 | 3.1210 | 1008 | 2.5263 |
1.1752 | 3.1862 | 1029 | 2.5244 |
1.2475 | 3.2514 | 1050 | 2.5180 |
1.2407 | 3.3165 | 1071 | 2.5161 |
1.2478 | 3.3817 | 1092 | 2.5279 |
1.1969 | 3.4469 | 1113 | 2.5171 |
1.1802 | 3.5120 | 1134 | 2.5435 |
1.2196 | 3.5772 | 1155 | 2.5194 |
1.1793 | 3.6424 | 1176 | 2.5250 |
1.2863 | 3.7075 | 1197 | 2.5148 |
1.2437 | 3.7727 | 1218 | 2.5327 |
1.1947 | 3.8379 | 1239 | 2.5291 |
1.281 | 3.9030 | 1260 | 2.5136 |
1.1866 | 3.9682 | 1281 | 2.5461 |
Framework versions
- PEFT 0.14.0
- Transformers 4.48.3
- Pytorch 2.6.0+cu124
- Datasets 3.2.0
- Tokenizers 0.21.0
- Downloads last month
- 8
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
HF Inference deployability: The model has no pipeline_tag.
Model tree for bkciccar/llama-3.1-8b-instruct-culture-lora
Base model
NousResearch/Meta-Llama-3.1-8B-Instruct