thorirhrafn commited on
Commit
fed5015
1 Parent(s): df40a0a

End of training

Browse files
Files changed (1) hide show
  1. README.md +29 -29
README.md CHANGED
@@ -18,15 +18,15 @@ should probably proofread and complete it, then remove this comment. -->
18
 
19
  This model is a fine-tuned version of [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) on an unknown dataset.
20
  It achieves the following results on the evaluation set:
21
- - Loss: 0.0739
22
- - Rewards/chosen: 0.4632
23
- - Rewards/rejected: -2.2899
24
  - Rewards/accuracies: 1.0
25
- - Rewards/margins: 2.7530
26
- - Logps/rejected: -207.7081
27
- - Logps/chosen: -156.0022
28
- - Logits/rejected: -1.0521
29
- - Logits/chosen: -0.8598
30
 
31
  ## Model description
32
 
@@ -45,7 +45,7 @@ More information needed
45
  ### Training hyperparameters
46
 
47
  The following hyperparameters were used during training:
48
- - learning_rate: 9e-07
49
  - train_batch_size: 1
50
  - eval_batch_size: 1
51
  - seed: 42
@@ -59,26 +59,26 @@ The following hyperparameters were used during training:
59
 
60
  | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
61
  |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
62
- | 0.6694 | 0.1 | 25 | 0.6365 | 0.0370 | -0.0813 | 0.9433 | 0.1183 | -185.6225 | -160.2637 | -1.0521 | -0.8545 |
63
- | 0.5526 | 0.2 | 50 | 0.5246 | 0.1015 | -0.2765 | 0.9967 | 0.3780 | -187.5744 | -159.6185 | -1.0524 | -0.8560 |
64
- | 0.4607 | 0.3 | 75 | 0.4173 | 0.1669 | -0.5106 | 1.0 | 0.6775 | -189.9152 | -158.9647 | -1.0530 | -0.8562 |
65
- | 0.3595 | 0.4 | 100 | 0.3251 | 0.2304 | -0.7635 | 1.0 | 0.9940 | -192.4449 | -158.3297 | -1.0530 | -0.8567 |
66
- | 0.297 | 0.5 | 125 | 0.2521 | 0.2883 | -1.0189 | 1.0 | 1.3072 | -194.9990 | -157.7509 | -1.0526 | -0.8573 |
67
- | 0.2217 | 0.6 | 150 | 0.1968 | 0.3313 | -1.2778 | 1.0 | 1.6090 | -197.5871 | -157.3212 | -1.0525 | -0.8576 |
68
- | 0.1832 | 0.7 | 175 | 0.1539 | 0.3750 | -1.5241 | 1.0 | 1.8991 | -200.0504 | -156.8834 | -1.0531 | -0.8606 |
69
- | 0.1374 | 0.79 | 200 | 0.1238 | 0.4055 | -1.7491 | 1.0 | 2.1546 | -202.3004 | -156.5787 | -1.0525 | -0.8614 |
70
- | 0.116 | 0.89 | 225 | 0.1027 | 0.4306 | -1.9426 | 1.0 | 2.3732 | -204.2353 | -156.3275 | -1.0526 | -0.8606 |
71
- | 0.095 | 0.99 | 250 | 0.0898 | 0.4405 | -2.0888 | 1.0 | 2.5293 | -205.6978 | -156.2289 | -1.0523 | -0.8603 |
72
- | 0.0921 | 1.09 | 275 | 0.0831 | 0.4465 | -2.1733 | 1.0 | 2.6198 | -206.5422 | -156.1685 | -1.0524 | -0.8593 |
73
- | 0.0734 | 1.19 | 300 | 0.0793 | 0.4520 | -2.2224 | 1.0 | 2.6744 | -207.0332 | -156.1135 | -1.0519 | -0.8627 |
74
- | 0.0711 | 1.29 | 325 | 0.0766 | 0.4558 | -2.2584 | 1.0 | 2.7142 | -207.3936 | -156.0763 | -1.0520 | -0.8592 |
75
- | 0.0806 | 1.39 | 350 | 0.0754 | 0.4630 | -2.2725 | 1.0 | 2.7355 | -207.5350 | -156.0041 | -1.0520 | -0.8599 |
76
- | 0.079 | 1.49 | 375 | 0.0748 | 0.4622 | -2.2779 | 1.0 | 2.7401 | -207.5887 | -156.0115 | -1.0522 | -0.8602 |
77
- | 0.0711 | 1.59 | 400 | 0.0746 | 0.4615 | -2.2817 | 1.0 | 2.7432 | -207.6269 | -156.0192 | -1.0519 | -0.8603 |
78
- | 0.0689 | 1.69 | 425 | 0.0744 | 0.4624 | -2.2862 | 1.0 | 2.7486 | -207.6718 | -156.0103 | -1.0522 | -0.8594 |
79
- | 0.0809 | 1.79 | 450 | 0.0742 | 0.4631 | -2.2887 | 1.0 | 2.7518 | -207.6965 | -156.0032 | -1.0517 | -0.8610 |
80
- | 0.0759 | 1.89 | 475 | 0.0740 | 0.4629 | -2.2902 | 1.0 | 2.7531 | -207.7117 | -156.0047 | -1.0517 | -0.8594 |
81
- | 0.0758 | 1.99 | 500 | 0.0739 | 0.4632 | -2.2899 | 1.0 | 2.7530 | -207.7081 | -156.0022 | -1.0521 | -0.8598 |
82
 
83
 
84
  ### Framework versions
 
18
 
19
  This model is a fine-tuned version of [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) on an unknown dataset.
20
  It achieves the following results on the evaluation set:
21
+ - Loss: 0.1045
22
+ - Rewards/chosen: 0.4197
23
+ - Rewards/rejected: -1.9316
24
  - Rewards/accuracies: 1.0
25
+ - Rewards/margins: 2.3513
26
+ - Logps/rejected: -204.1257
27
+ - Logps/chosen: -156.4368
28
+ - Logits/rejected: -1.0515
29
+ - Logits/chosen: -0.8584
30
 
31
  ## Model description
32
 
 
45
  ### Training hyperparameters
46
 
47
  The following hyperparameters were used during training:
48
+ - learning_rate: 7.5e-07
49
  - train_batch_size: 1
50
  - eval_batch_size: 1
51
  - seed: 42
 
59
 
60
  | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
61
  |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
62
+ | 0.6732 | 0.1 | 25 | 0.6518 | 0.0274 | -0.0584 | 0.8867 | 0.0858 | -185.3935 | -160.3602 | -1.0521 | -0.8541 |
63
+ | 0.588 | 0.2 | 50 | 0.5616 | 0.0780 | -0.2093 | 0.9933 | 0.2873 | -186.9026 | -159.8541 | -1.0523 | -0.8550 |
64
+ | 0.5077 | 0.3 | 75 | 0.4690 | 0.1360 | -0.3896 | 1.0 | 0.5256 | -188.7056 | -159.2737 | -1.0525 | -0.8564 |
65
+ | 0.4179 | 0.4 | 100 | 0.3872 | 0.1873 | -0.5861 | 1.0 | 0.7734 | -190.6710 | -158.7608 | -1.0532 | -0.8563 |
66
+ | 0.3614 | 0.5 | 125 | 0.3170 | 0.2381 | -0.7895 | 1.0 | 1.0276 | -192.7043 | -158.2528 | -1.0533 | -0.8568 |
67
+ | 0.2812 | 0.6 | 150 | 0.2544 | 0.2856 | -1.0121 | 1.0 | 1.2977 | -194.9309 | -157.7783 | -1.0527 | -0.8569 |
68
+ | 0.2378 | 0.7 | 175 | 0.2066 | 0.3262 | -1.2240 | 1.0 | 1.5502 | -197.0494 | -157.3717 | -1.0520 | -0.8573 |
69
+ | 0.1866 | 0.79 | 200 | 0.1704 | 0.3591 | -1.4222 | 1.0 | 1.7812 | -199.0312 | -157.0431 | -1.0526 | -0.8577 |
70
+ | 0.1555 | 0.89 | 225 | 0.1429 | 0.3829 | -1.6050 | 1.0 | 1.9879 | -200.8594 | -156.8051 | -1.0523 | -0.8580 |
71
+ | 0.1312 | 0.99 | 250 | 0.1239 | 0.4002 | -1.7534 | 1.0 | 2.1536 | -202.3439 | -156.6322 | -1.0515 | -0.8572 |
72
+ | 0.1276 | 1.09 | 275 | 0.1147 | 0.4086 | -1.8325 | 1.0 | 2.2410 | -203.1341 | -156.5480 | -1.0518 | -0.8578 |
73
+ | 0.1038 | 1.19 | 300 | 0.1094 | 0.4144 | -1.8779 | 1.0 | 2.2923 | -203.5883 | -156.4901 | -1.0511 | -0.8574 |
74
+ | 0.101 | 1.29 | 325 | 0.1072 | 0.4191 | -1.9023 | 1.0 | 2.3214 | -203.8326 | -156.4429 | -1.0512 | -0.8569 |
75
+ | 0.1128 | 1.39 | 350 | 0.1056 | 0.4189 | -1.9206 | 1.0 | 2.3394 | -204.0154 | -156.4454 | -1.0511 | -0.8576 |
76
+ | 0.11 | 1.49 | 375 | 0.1047 | 0.4220 | -1.9262 | 1.0 | 2.3482 | -204.0712 | -156.4135 | -1.0509 | -0.8570 |
77
+ | 0.1001 | 1.59 | 400 | 0.1048 | 0.4224 | -1.9281 | 1.0 | 2.3505 | -204.0909 | -156.4098 | -1.0514 | -0.8574 |
78
+ | 0.0978 | 1.69 | 425 | 0.1042 | 0.4246 | -1.9292 | 1.0 | 2.3538 | -204.1014 | -156.3875 | -1.0512 | -0.8573 |
79
+ | 0.1111 | 1.79 | 450 | 0.1041 | 0.4244 | -1.9292 | 1.0 | 2.3536 | -204.1017 | -156.3903 | -1.0514 | -0.8587 |
80
+ | 0.1064 | 1.89 | 475 | 0.1044 | 0.4199 | -1.9317 | 1.0 | 2.3516 | -204.1266 | -156.4352 | -1.0514 | -0.8577 |
81
+ | 0.107 | 1.99 | 500 | 0.1045 | 0.4197 | -1.9316 | 1.0 | 2.3513 | -204.1257 | -156.4368 | -1.0515 | -0.8584 |
82
 
83
 
84
  ### Framework versions