metadata
license: llama2
base_model: meta-llama/Llama-2-7b-hf
tags:
- generated_from_trainer
datasets:
- tyzhu/lmind_nq_train6000_eval6489_v1_reciteonly_qa_v3
metrics:
- accuracy
model-index:
- name: lmind_nq_train6000_eval6489_v1_reciteonly_qa_v3_1e-4_lora2
results:
- task:
name: Causal Language Modeling
type: text-generation
dataset:
name: tyzhu/lmind_nq_train6000_eval6489_v1_reciteonly_qa_v3
type: tyzhu/lmind_nq_train6000_eval6489_v1_reciteonly_qa_v3
metrics:
- name: Accuracy
type: accuracy
value: 0.6445803921568627
lmind_nq_train6000_eval6489_v1_reciteonly_qa_v3_1e-4_lora2
This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on the tyzhu/lmind_nq_train6000_eval6489_v1_reciteonly_qa_v3 dataset. It achieves the following results on the evaluation set:
- Loss: 2.7443
- Accuracy: 0.6446
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 4
- total_train_batch_size: 32
- total_eval_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 50.0
Training results
Training Loss | Epoch | Step | Accuracy | Validation Loss |
---|---|---|---|---|
1.3346 | 1.0 | 187 | 0.6671 | 1.2008 |
1.1638 | 2.0 | 375 | 0.6681 | 1.1940 |
1.0624 | 3.0 | 562 | 0.6676 | 1.2021 |
0.9457 | 4.0 | 750 | 0.6663 | 1.2319 |
0.8373 | 5.0 | 937 | 0.6639 | 1.2842 |
0.7159 | 6.0 | 1125 | 0.6607 | 1.3518 |
0.5964 | 7.0 | 1312 | 0.6582 | 1.4532 |
0.4861 | 8.0 | 1500 | 0.6549 | 1.5512 |
0.3754 | 9.0 | 1687 | 0.6529 | 1.6544 |
0.2938 | 10.0 | 1875 | 0.6505 | 1.7852 |
0.2268 | 11.0 | 2062 | 0.6490 | 1.9338 |
0.1792 | 12.0 | 2250 | 0.6479 | 2.0116 |
0.1418 | 13.0 | 2437 | 0.6470 | 2.1431 |
0.1171 | 14.0 | 2625 | 0.6447 | 2.2358 |
0.1038 | 15.0 | 2812 | 0.6461 | 2.3164 |
0.0958 | 16.0 | 3000 | 0.6452 | 2.3597 |
0.0848 | 17.0 | 3187 | 0.6453 | 2.4430 |
0.0804 | 18.0 | 3375 | 0.6441 | 2.4833 |
0.0786 | 19.0 | 3562 | 0.6439 | 2.4723 |
0.0786 | 20.0 | 3750 | 0.6437 | 2.5403 |
0.0792 | 21.0 | 3937 | 0.6441 | 2.4761 |
0.0792 | 22.0 | 4125 | 0.6447 | 2.5409 |
0.0781 | 23.0 | 4312 | 0.6449 | 2.5628 |
0.0766 | 24.0 | 4500 | 0.6446 | 2.5601 |
0.0709 | 25.0 | 4687 | 0.6453 | 2.5480 |
0.07 | 26.0 | 4875 | 0.6455 | 2.6145 |
0.0704 | 27.0 | 5062 | 0.6437 | 2.6258 |
0.073 | 28.0 | 5250 | 0.6449 | 2.5735 |
0.0738 | 29.0 | 5437 | 0.6441 | 2.6097 |
0.0727 | 30.0 | 5625 | 0.6427 | 2.5475 |
0.0727 | 31.0 | 5812 | 0.6435 | 2.6130 |
0.0715 | 32.0 | 6000 | 0.6441 | 2.6316 |
0.0679 | 33.0 | 6187 | 0.6442 | 2.5900 |
0.0684 | 34.0 | 6375 | 0.6445 | 2.6209 |
0.0676 | 35.0 | 6562 | 0.6452 | 2.6090 |
0.068 | 36.0 | 6750 | 0.6451 | 2.6729 |
0.0682 | 37.0 | 6937 | 0.6456 | 2.6381 |
0.0695 | 38.0 | 7125 | 0.6441 | 2.7113 |
0.07 | 39.0 | 7312 | 0.6438 | 2.6791 |
0.0709 | 40.0 | 7500 | 0.6444 | 2.6901 |
0.0662 | 41.0 | 7687 | 0.6455 | 2.6341 |
0.0664 | 42.0 | 7875 | 0.6451 | 2.7369 |
0.0658 | 43.0 | 8062 | 0.6452 | 2.6964 |
0.0677 | 44.0 | 8250 | 0.6442 | 2.6634 |
0.0668 | 45.0 | 8437 | 0.6436 | 2.7614 |
0.0657 | 46.0 | 8625 | 0.6446 | 2.7360 |
0.0656 | 47.0 | 8812 | 0.6441 | 2.7653 |
0.0658 | 48.0 | 9000 | 0.6453 | 2.7756 |
0.0626 | 49.0 | 9187 | 0.6464 | 2.7578 |
0.0666 | 49.87 | 9350 | 0.6446 | 2.7443 |
Framework versions
- Transformers 4.34.0
- Pytorch 2.1.0+cu121
- Datasets 2.18.0
- Tokenizers 0.14.1