Jellywibble commited on
Commit
69f4da9
1 Parent(s): 0cb1291

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -1
README.md CHANGED
@@ -68,4 +68,24 @@ The original dataset contains over 50 million rows of completions (chatbot respo
68
  </figure>
69
 
70
  ### Training procedure
71
- The `gpt2_base_retry_and_continue_5m_reward_model` was trained using a [gpt2](https://huggingface.co/gpt2) base model and a classification head with single output. Binary Cross Entropy loss was used. The model was trained on 4xA40 GPUs, 16 per device batch size and gradient accumulation of 1 (therefore the effective batch size is 64), with 1e-5 learning rate for 2 epochs for a total of 156,240 steps. Tensor parallelism and pipeline parallelism were used to distribute the model across GPUs. For evaluation metrics used during training, please see our [Weights and Biases Log](https://wandb.ai/jellywibble/reward).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
68
  </figure>
69
 
70
  ### Training procedure
71
+ The `gpt2_base_retry_and_continue_5m_reward_model` was trained using a [gpt2](https://huggingface.co/gpt2) base model and a classification head with single output. Binary Cross Entropy loss was used. The model was trained on 4xA40 GPUs, 16 per device batch size and gradient accumulation of 1 (therefore the effective batch size is 64), with 1e-5 learning rate for 2 epochs for a total of 156,240 steps. Tensor parallelism and pipeline parallelism were used to distribute the model across GPUs. For evaluation metrics used during training, please see our [Weights and Biases Log](https://wandb.ai/jellywibble/reward).
72
+
73
+ ### BibTeX entry
74
+ To cite this model:
75
+ ```bibtex
76
+ @misc{
77
+ author = {Chai Research, Irvine, Boubert, Raina, Liusie, Mudupalli, Korshuk, Liu, Cremer, Assassi, C. Beauchamp, Lu, Rialan, W. Beauchamp},
78
+ title = {{Rewarding chatbots for real-world engagement with millions of users}},
79
+ howpublished = {\url{https://arxiv.org/abs/2303.06135}},
80
+ year = 2023,
81
+ month = Mar
82
+ }
83
+ ```
84
+ If you use this model, we would love to hear about it! Reach out on [correspondence email](mailto:thomas@chai-research.com?subject=Chai%20Research%20Paper%20Enquiry) or Discord.
85
+
86
+ ### Acknowledgements
87
+ This project would not have been possible without the support from members of [Seamless Capital](https://www.seamless-capital.com/)
88
+
89
+ We thank the following authors from the [Machine Intelligence Laboratory](https://mi.eng.cam.ac.uk/) for their collaboration:
90
+ - [Vysas Raina](https://www.linkedin.com/in/vyas-raina-71b226152/)
91
+ - [Adian Liusie](https://www.linkedin.com/in/adian-liusie-00b60511a/)