weqweasdas commited on
Commit
23cf908
1 Parent(s): 7ecd22c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -26,9 +26,9 @@ Then, we split the dataset as follows:
26
 
27
  ### Training
28
 
29
- To use the data more efficiently, we concatenate texts with an EOS token in between and split them into 1024-sized chunks, rather than padding them according to the longest text (in each batch). We then finetune the base model on the SFT dataset for two epochs, using a learning rate of 2e-5 and a linear decay schedule.
30
 
31
- We conduct reward modeling with learning rate 5e-6 for 1 epoch and linear decay schedule because it seems that the model easily overfits with more than 1 epoches. We discard the samples longer than 512 tokens so we have approximately 10.6K samples in the training set and 5K samples in the test set for reward modeling.
32
 
33
  We use bf16 and do not use LoRA in both of the stages.
34
 
 
26
 
27
  ### Training
28
 
29
+ To use the data more efficiently, we concatenate texts and split them into 1024-sized chunks, rather than padding them according to the longest text (in each batch). We then finetune the base model on the SFT dataset for two epochs, using a learning rate of 2e-5 and a linear decay schedule.
30
 
31
+ We conduct reward modeling with learning rate 5e-6 for 1 epoch and linear decay schedule because it seems that the model easily overfits with more than 1 epoches. We discard the samples longer than 512 tokens so we have approximately 106K samples in the training set and 5K samples in the test set for reward modeling.
32
 
33
  We use bf16 and do not use LoRA in both of the stages.
34