kaitchup
/

OPT-350M-RM-DSChat

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

bnjmnmarie commited on Sep 27, 2023

Commit

4dc6480

·

1 Parent(s): db5e777

Update README.md

Files changed (1) hide show

README.md +27 -0

README.md CHANGED Viewed

@@ -1,3 +1,30 @@
 ---
 license: cc-by-nc-sa-4.0
 ---

 ---
 license: cc-by-nc-sa-4.0
+datasets:
+- Dahoas/rm-static
+- Dahoas/synthetic-instruct-gptj-pairwise
+- Anthropic/hh-rlhf
+language:
+- en
 ---
+# Model Card for Model ID
+This a model is a reward model for RLHF fine-tuned using DeepSpeed Chat.
+It is based on OPT-350M.
+## Model Details
+### Model Description
+- **Developed by:** [The Kaitchup](https://kaitchup.substack.com/)
+- **Model type:** Reward model
+- **Language(s) (NLP):** English
+- **License:** cc-by-nc-sa-4.0
+- **Finetuned from model:** [facebook/opt-350m](https://huggingface.co/facebook/opt-350m)
+### Model Sources
+The model has been trained with the procedure described in this article:
+[Train Instruct LLMs On Your GPU with DeepSpeed Chat — Step #2: Training a Reward Model](https://kaitchup.substack.com/p/train-instruct-llms-on-your-gpu-with-1e1)