vincentmin commited on
Commit
9877e68
·
1 Parent(s): ea9a4cd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -1
README.md CHANGED
@@ -54,7 +54,7 @@ Since the model was trained on oasst1 data, the reward will reflect any biases p
54
 
55
  ## Training and evaluation data
56
 
57
- The model was trained using QLoRA and the `trl` library's `RewardTrainer` on the first 10000 rows of the [tasksource/oasst1_pairwise_rlhf_reward](https://huggingface.co/datasets/tasksource/oasst1_pairwise_rlhf_reward) dataset.
58
 
59
  ### Training hyperparameters
60
 
@@ -79,6 +79,7 @@ The following hyperparameters were used during training:
79
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
80
  - lr_scheduler_type: linear
81
  - num_epochs: 1
 
82
 
83
  ### Training results
84
 
 
54
 
55
  ## Training and evaluation data
56
 
57
+ The model was trained using QLoRA and the `trl` library's `RewardTrainer` on the [tasksource/oasst1_pairwise_rlhf_reward](https://huggingface.co/datasets/tasksource/oasst1_pairwise_rlhf_reward) dataset. Examples with more than 1024 tokens were filtered out and the training data was restricted to the first 10000 rows of the filtered dataset.
58
 
59
  ### Training hyperparameters
60
 
 
79
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
80
  - lr_scheduler_type: linear
81
  - num_epochs: 1
82
+ - max_seq_length: 1024
83
 
84
  ### Training results
85