Jellywibble commited on
Commit
15f2219
1 Parent(s): 51c1150

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -2
README.md CHANGED
@@ -67,5 +67,4 @@ The original dataset contains over 50 million rows of completions (chatbot respo
67
  </figure>
68
 
69
  ### Training procedure
70
- The `gpt2_base_retry_and_continue_5m_reward_model` was trained using a [gpt2](https://huggingface.co/gpt2) base model and a classification head with single output. Binary Cross Entropy loss was used. The model was trained on 4xA40 GPUs, 16 per device batch size and gradient accumulation of 1 (therefore the effective batch size is 64), with 1e-5 learning rate for 2 epochs for a total of 156,240 steps. Tensor parallelism and pipeline parallelism were used to distribute the model across GPUs.
71
- [Weights and Biases Log](https://wandb.ai/jellywibble/reward)
 
67
  </figure>
68
 
69
  ### Training procedure
70
+ The `gpt2_base_retry_and_continue_5m_reward_model` was trained using a [gpt2](https://huggingface.co/gpt2) base model and a classification head with single output. Binary Cross Entropy loss was used. The model was trained on 4xA40 GPUs, 16 per device batch size and gradient accumulation of 1 (therefore the effective batch size is 64), with 1e-5 learning rate for 2 epochs for a total of 156,240 steps. Tensor parallelism and pipeline parallelism were used to distribute the model across GPUs. For evaluation metrics used during training, please see our [Weights and Biases Log](https://wandb.ai/jellywibble/reward).