thorirhrafn commited on
Commit
933ea22
1 Parent(s): 2a9f73c

End of training

Browse files
Files changed (1) hide show
  1. README.md +19 -19
README.md CHANGED
@@ -18,15 +18,15 @@ should probably proofread and complete it, then remove this comment. -->
18
 
19
  This model is a fine-tuned version of [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) on an unknown dataset.
20
  It achieves the following results on the evaluation set:
21
- - Loss: 0.0071
22
- - Rewards/chosen: 0.5889
23
- - Rewards/rejected: -5.0531
24
  - Rewards/accuracies: 1.0
25
- - Rewards/margins: 5.6419
26
- - Logps/rejected: -235.3402
27
- - Logps/chosen: -154.7451
28
- - Logits/rejected: -1.0643
29
- - Logits/chosen: -0.8789
30
 
31
  ## Model description
32
 
@@ -45,7 +45,7 @@ More information needed
45
  ### Training hyperparameters
46
 
47
  The following hyperparameters were used during training:
48
- - learning_rate: 5e-06
49
  - train_batch_size: 1
50
  - eval_batch_size: 1
51
  - seed: 42
@@ -59,16 +59,16 @@ The following hyperparameters were used during training:
59
 
60
  | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
61
  |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
62
- | 0.3699 | 0.1 | 25 | 0.1650 | 0.3739 | -1.4468 | 1.0 | 1.8207 | -199.2772 | -156.8949 | -1.0531 | -0.8630 |
63
- | 0.0503 | 0.2 | 50 | 0.0317 | 0.5533 | -3.2234 | 1.0 | 3.7768 | -217.0439 | -155.1004 | -1.0589 | -0.8704 |
64
- | 0.0179 | 0.3 | 75 | 0.0145 | 0.5776 | -4.1388 | 1.0 | 4.7164 | -226.1972 | -154.8575 | -1.0622 | -0.8764 |
65
- | 0.0119 | 0.4 | 100 | 0.0103 | 0.5835 | -4.5814 | 1.0 | 5.1648 | -230.6233 | -154.7993 | -1.0647 | -0.8764 |
66
- | 0.0092 | 0.5 | 125 | 0.0084 | 0.5818 | -4.8305 | 1.0 | 5.4123 | -233.1143 | -154.8155 | -1.0657 | -0.8775 |
67
- | 0.0076 | 0.6 | 150 | 0.0077 | 0.5824 | -4.9413 | 1.0 | 5.5236 | -234.2221 | -154.8100 | -1.0651 | -0.8784 |
68
- | 0.008 | 0.7 | 175 | 0.0074 | 0.5873 | -5.0013 | 1.0 | 5.5886 | -234.8223 | -154.7610 | -1.0653 | -0.8776 |
69
- | 0.0097 | 0.79 | 200 | 0.0071 | 0.5891 | -5.0424 | 1.0 | 5.6315 | -235.2336 | -154.7425 | -1.0646 | -0.8784 |
70
- | 0.0077 | 0.89 | 225 | 0.0072 | 0.5884 | -5.0430 | 1.0 | 5.6313 | -235.2391 | -154.7501 | -1.0645 | -0.8790 |
71
- | 0.0068 | 0.99 | 250 | 0.0071 | 0.5889 | -5.0531 | 1.0 | 5.6419 | -235.3402 | -154.7451 | -1.0643 | -0.8789 |
72
 
73
 
74
  ### Framework versions
 
18
 
19
  This model is a fine-tuned version of [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) on an unknown dataset.
20
  It achieves the following results on the evaluation set:
21
+ - Loss: 0.1779
22
+ - Rewards/chosen: 0.3527
23
+ - Rewards/rejected: -1.3764
24
  - Rewards/accuracies: 1.0
25
+ - Rewards/margins: 1.7292
26
+ - Logps/rejected: -198.5740
27
+ - Logps/chosen: -157.1067
28
+ - Logits/rejected: -1.0528
29
+ - Logits/chosen: -0.8587
30
 
31
  ## Model description
32
 
 
45
  ### Training hyperparameters
46
 
47
  The following hyperparameters were used during training:
48
+ - learning_rate: 1e-06
49
  - train_batch_size: 1
50
  - eval_batch_size: 1
51
  - seed: 42
 
59
 
60
  | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
61
  |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
62
+ | 0.6603 | 0.1 | 25 | 0.6253 | 0.0416 | -0.1007 | 0.9633 | 0.1423 | -185.8169 | -160.2181 | -1.0525 | -0.8550 |
63
+ | 0.5342 | 0.2 | 50 | 0.5074 | 0.1130 | -0.3090 | 1.0 | 0.4220 | -187.8993 | -159.5039 | -1.0525 | -0.8569 |
64
+ | 0.4382 | 0.3 | 75 | 0.4022 | 0.1798 | -0.5442 | 1.0 | 0.7241 | -190.2517 | -158.8354 | -1.0530 | -0.8563 |
65
+ | 0.3592 | 0.4 | 100 | 0.3212 | 0.2338 | -0.7752 | 1.0 | 1.0090 | -192.5613 | -158.2961 | -1.0531 | -0.8579 |
66
+ | 0.3035 | 0.5 | 125 | 0.2590 | 0.2824 | -0.9912 | 1.0 | 1.2736 | -194.7217 | -157.8096 | -1.0528 | -0.8583 |
67
+ | 0.2374 | 0.6 | 150 | 0.2125 | 0.3190 | -1.1966 | 1.0 | 1.5157 | -196.7760 | -157.4438 | -1.0528 | -0.8575 |
68
+ | 0.2094 | 0.7 | 175 | 0.1868 | 0.3455 | -1.3260 | 1.0 | 1.6714 | -198.0693 | -157.1793 | -1.0528 | -0.8598 |
69
+ | 0.1886 | 0.79 | 200 | 0.1796 | 0.3491 | -1.3639 | 1.0 | 1.7130 | -198.4486 | -157.1428 | -1.0532 | -0.8617 |
70
+ | 0.1805 | 0.89 | 225 | 0.1785 | 0.3523 | -1.3731 | 1.0 | 1.7254 | -198.5406 | -157.1107 | -1.0530 | -0.8593 |
71
+ | 0.1821 | 0.99 | 250 | 0.1779 | 0.3527 | -1.3764 | 1.0 | 1.7292 | -198.5740 | -157.1067 | -1.0528 | -0.8587 |
72
 
73
 
74
  ### Framework versions