ChaiML
/

gpt2_base_retry_and_continue_5m_reward_model

Text Classification

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Jellywibble commited on Mar 6, 2023

Commit

15f2219

•

1 Parent(s): 51c1150

Update README.md

Files changed (1) hide show

README.md +1 -2

README.md CHANGED Viewed

@@ -67,5 +67,4 @@ The original dataset contains over 50 million rows of completions (chatbot respo
 </figure>
 ### Training procedure
-The `gpt2_base_retry_and_continue_5m_reward_model` was trained using a [gpt2](https://huggingface.co/gpt2) base model and a classification head with single output. Binary Cross Entropy loss was used. The model was trained on 4xA40 GPUs, 16 per device batch size and gradient accumulation of 1 (therefore the effective batch size is 64), with 1e-5 learning rate for 2 epochs for a total of 156,240 steps. Tensor parallelism and pipeline parallelism were used to distribute the model across GPUs.
-[Weights and Biases Log](https://wandb.ai/jellywibble/reward)

 </figure>
 ### Training procedure
+The `gpt2_base_retry_and_continue_5m_reward_model` was trained using a [gpt2](https://huggingface.co/gpt2) base model and a classification head with single output. Binary Cross Entropy loss was used. The model was trained on 4xA40 GPUs, 16 per device batch size and gradient accumulation of 1 (therefore the effective batch size is 64), with 1e-5 learning rate for 2 epochs for a total of 156,240 steps. Tensor parallelism and pipeline parallelism were used to distribute the model across GPUs. For evaluation metrics used during training, please see our [Weights and Biases Log](https://wandb.ai/jellywibble/reward).