ChaiML
/

gpt2_xl_retry_and_continue_12m_reward_model

Text Classification

Inference Endpoints

text-generation-inference

Model card Files Files and versions Community

Jellywibble commited on Mar 13, 2023

Commit

72779c3

•

1 Parent(s): 4baf87b

Update README.md

Files changed (1) hide show

README.md +21 -1

README.md CHANGED Viewed

@@ -68,4 +68,24 @@ The original dataset contains over 50 million rows of completions (chatbot respo
 </figure>
 ### Training procedure
-The `gpt2_xl_retry_and_continue_12m_reward_model` was trained using a [gpt2-xl](https://huggingface.co/gpt2-xl) base model and a classification head with single output. Binary Cross Entropy loss was used. The model was trained on 4xA40 GPUs, 16 per device batch size and gradient accumulation of 1 (therefore the effective batch size is 64), with 1e-5 learning rate for 2 epochs for a total of 375,000 steps. Tensor parallelism and pipeline parallelism were used to distribute the model across GPUs.

 </figure>
 ### Training procedure
+The `gpt2_xl_retry_and_continue_12m_reward_model` was trained using a [gpt2-xl](https://huggingface.co/gpt2-xl) base model and a classification head with single output. Binary Cross Entropy loss was used. The model was trained on 4xA40 GPUs, 16 per device batch size and gradient accumulation of 1 (therefore the effective batch size is 64), with 1e-5 learning rate for 2 epochs for a total of 375,000 steps. Tensor parallelism and pipeline parallelism were used to distribute the model across GPUs.
+### BibTeX entry
+To cite this model:
+```bibtex
+@misc{
+  author = {Chai Research, Irvine, Boubert, Raina, Liusie, Mudupalli, Korshuk, Liu, Cremer, Assassi, C. Beauchamp, Lu, Rialan, W. Beauchamp},
+  title = {{Rewarding chatbots for real-world engagement with millions of users}},
+  howpublished = {\url{https://arxiv.org/abs/2303.06135}},
+  year = 2023,
+  month = Mar
+}
+```
+If you use this model, we would love to hear about it! Reach out on [correspondence email](mailto:thomas@chai-research.com?subject=Chai%20Research%20Paper%20Enquiry) or Discord.
+### Acknowledgements
+This project would not have been possible without the support from members of [Seamless Capital](https://www.seamless-capital.com/)
+We thank the following authors from the [Machine Intelligence Laboratory](https://mi.eng.cam.ac.uk/) for their collaboration:
+- [Vysas Raina](https://www.linkedin.com/in/vyas-raina-71b226152/)
+- [Adian Liusie](https://www.linkedin.com/in/adian-liusie-00b60511a/)