vincentmin
commited on
Commit
·
9877e68
1
Parent(s):
ea9a4cd
Update README.md
Browse files
README.md
CHANGED
@@ -54,7 +54,7 @@ Since the model was trained on oasst1 data, the reward will reflect any biases p
|
|
54 |
|
55 |
## Training and evaluation data
|
56 |
|
57 |
-
The model was trained using QLoRA and the `trl` library's `RewardTrainer` on the
|
58 |
|
59 |
### Training hyperparameters
|
60 |
|
@@ -79,6 +79,7 @@ The following hyperparameters were used during training:
|
|
79 |
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
80 |
- lr_scheduler_type: linear
|
81 |
- num_epochs: 1
|
|
|
82 |
|
83 |
### Training results
|
84 |
|
|
|
54 |
|
55 |
## Training and evaluation data
|
56 |
|
57 |
+
The model was trained using QLoRA and the `trl` library's `RewardTrainer` on the [tasksource/oasst1_pairwise_rlhf_reward](https://huggingface.co/datasets/tasksource/oasst1_pairwise_rlhf_reward) dataset. Examples with more than 1024 tokens were filtered out and the training data was restricted to the first 10000 rows of the filtered dataset.
|
58 |
|
59 |
### Training hyperparameters
|
60 |
|
|
|
79 |
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
80 |
- lr_scheduler_type: linear
|
81 |
- num_epochs: 1
|
82 |
+
- max_seq_length: 1024
|
83 |
|
84 |
### Training results
|
85 |
|