weqweasdas
commited on
Commit
•
23cf908
1
Parent(s):
7ecd22c
Update README.md
Browse files
README.md
CHANGED
@@ -26,9 +26,9 @@ Then, we split the dataset as follows:
|
|
26 |
|
27 |
### Training
|
28 |
|
29 |
-
To use the data more efficiently, we concatenate texts
|
30 |
|
31 |
-
We conduct reward modeling with learning rate 5e-6 for 1 epoch and linear decay schedule because it seems that the model easily overfits with more than 1 epoches. We discard the samples longer than 512 tokens so we have approximately
|
32 |
|
33 |
We use bf16 and do not use LoRA in both of the stages.
|
34 |
|
|
|
26 |
|
27 |
### Training
|
28 |
|
29 |
+
To use the data more efficiently, we concatenate texts and split them into 1024-sized chunks, rather than padding them according to the longest text (in each batch). We then finetune the base model on the SFT dataset for two epochs, using a learning rate of 2e-5 and a linear decay schedule.
|
30 |
|
31 |
+
We conduct reward modeling with learning rate 5e-6 for 1 epoch and linear decay schedule because it seems that the model easily overfits with more than 1 epoches. We discard the samples longer than 512 tokens so we have approximately 106K samples in the training set and 5K samples in the test set for reward modeling.
|
32 |
|
33 |
We use bf16 and do not use LoRA in both of the stages.
|
34 |
|