kaitchup
/

OPT-1.3B-RLHF-DSChatLoRA

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

bnjmnmarie commited on Sep 27, 2023

Commit

6fd1421

•

1 Parent(s): 5ff8f6d

Update README.md

Files changed (1) hide show

README.md +25 -0

README.md CHANGED Viewed

@@ -1,3 +1,28 @@
 ---
 license: cc-by-nc-sa-4.0
 ---

 ---
 license: cc-by-nc-sa-4.0
+datasets:
+- Dahoas/rm-static
+language:
+- en
 ---
+# Model Card for Model ID
+This a model is a chat model fine-tuned with RLHF using DeepSpeed Chat and LoRA.
+It is based on OPT1.3B.
+## Model Details
+### Model Description
+- **Developed by:** [The Kaitchup](https://kaitchup.substack.com/)
+- **Model type:** Causal
+- **Language(s) (NLP):** English
+- **License:** cc-by-nc-sa-4.0
+- **Finetuned from model:** [facebook/opt-1.3b](https://huggingface.co/facebook/opt-1.3b)
+### Model Sources
+The model has been trained with the procedure described in this article:
+[Train Instruct LLMs On Your GPU with DeepSpeed Chat — Step #3: Reinforcement Learning with Human Feedback](https://kaitchup.substack.com/p/train-instruct-llms-on-your-gpu-with-6a5)