Jellywibble
commited on
Commit
•
15f2219
1
Parent(s):
51c1150
Update README.md
Browse files
README.md
CHANGED
@@ -67,5 +67,4 @@ The original dataset contains over 50 million rows of completions (chatbot respo
|
|
67 |
</figure>
|
68 |
|
69 |
### Training procedure
|
70 |
-
The `gpt2_base_retry_and_continue_5m_reward_model` was trained using a [gpt2](https://huggingface.co/gpt2) base model and a classification head with single output. Binary Cross Entropy loss was used. The model was trained on 4xA40 GPUs, 16 per device batch size and gradient accumulation of 1 (therefore the effective batch size is 64), with 1e-5 learning rate for 2 epochs for a total of 156,240 steps. Tensor parallelism and pipeline parallelism were used to distribute the model across GPUs.
|
71 |
-
[Weights and Biases Log](https://wandb.ai/jellywibble/reward)
|
|
|
67 |
</figure>
|
68 |
|
69 |
### Training procedure
|
70 |
+
The `gpt2_base_retry_and_continue_5m_reward_model` was trained using a [gpt2](https://huggingface.co/gpt2) base model and a classification head with single output. Binary Cross Entropy loss was used. The model was trained on 4xA40 GPUs, 16 per device batch size and gradient accumulation of 1 (therefore the effective batch size is 64), with 1e-5 learning rate for 2 epochs for a total of 156,240 steps. Tensor parallelism and pipeline parallelism were used to distribute the model across GPUs. For evaluation metrics used during training, please see our [Weights and Biases Log](https://wandb.ai/jellywibble/reward).
|
|