vincentmin
/

llama-2-7b-reward-oasst1

Generated from Trainer

Model card Files Files and versions Metrics Training metrics Community

vincentmin commited on Jul 27, 2023

Commit

9877e68

·

1 Parent(s): ea9a4cd

Update README.md

Files changed (1) hide show

README.md +2 -1

README.md CHANGED Viewed

@@ -54,7 +54,7 @@ Since the model was trained on oasst1 data, the reward will reflect any biases p
 ## Training and evaluation data
-The model was trained using QLoRA and the `trl` library's `RewardTrainer` on the first 10000 rows of the [tasksource/oasst1_pairwise_rlhf_reward](https://huggingface.co/datasets/tasksource/oasst1_pairwise_rlhf_reward) dataset.
 ### Training hyperparameters
@@ -79,6 +79,7 @@ The following hyperparameters were used during training:
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
 - num_epochs: 1
 ### Training results

 ## Training and evaluation data
+The model was trained using QLoRA and the `trl` library's `RewardTrainer` on the [tasksource/oasst1_pairwise_rlhf_reward](https://huggingface.co/datasets/tasksource/oasst1_pairwise_rlhf_reward) dataset. Examples with more than 1024 tokens were filtered out and the training data was restricted to the first 10000 rows of the filtered dataset.
 ### Training hyperparameters
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
 - num_epochs: 1
+- max_seq_length: 1024
 ### Training results