trl-lib
/

llama-7b-se-rm-peft

Model card Files Files and versions Community

natolambert commited on Apr 6, 2023

Commit

7bf36fd

•

1 Parent(s): a804f8f

Update README.md

Files changed (1) hide show

README.md +21 -7

README.md CHANGED Viewed

@@ -15,12 +15,16 @@ datasets:
 # Llama-se-rm-peft
-Adapter weights of a reward model based on LLaMa. Authored by Edward Beeching, Younes Belkada, Kashif Rasul, Lewis Tunstall and Leandro von Werra.
-For more info check out the [blog post]() and [github example]().
 ## Model Description
-**Llama-se-rm** is a Llama-based model that has been first fine-tuned on the Stack Exchange dataset and used for reward modeling using a Stack Exchange Data. This dataset consists of questions and answers from various domains in Stack Exchange, such as programming, mathematics, physics, and more. The model is designed to generate human-like responses to questions in these domains. The model has been training to respond to prompts with the following template:
 ```
 Question: <Query>
@@ -44,10 +48,20 @@ While this demographic information likely varies by topic, disparities between t
 Additionally, the model may generate answers that are incorrect or misleading due to the inherent limitations of the Llama architecture.
 ## BibTeX entry and citation info
 ```bibtex
-@misc{beeching2023llama,
-  title={StackLLaMa: An RL Fine-tuned LLaMa Model for Stack Exchange Question and Answering},
-  author={Beeching, Edward and Belkada, Younes and Rasul, Kashif and Tunstall, Lewis and von Werra, Leandro},
-  year={2023}
 }
 ```

 # Llama-se-rm-peft
+Adapter weights of a reward model based on LLaMa (see Meta's LLaMA release for the original LLaMA model).
+For more info check out the [blog post](https://huggingface.co/blog/stackllama) and [github example](https://github.com/lvwerra/trl/tree/main/examples/stack_llama/scripts).
 ## Model Description
+**Llama-se-rm** is a Llama-based model that has been first fine-tuned on the Stack Exchange dataset and used for reward modeling using a Stack Exchange Data.
+This dataset consists of questions and answers from various domains in Stack Exchange, such as programming, mathematics, physics, and more.
+The model is designed to generate human-like responses to questions in these domains.
+The model has been training to respond to prompts with the following template:
 ```
 Question: <Query>
 Additionally, the model may generate answers that are incorrect or misleading due to the inherent limitations of the Llama architecture.
 ## BibTeX entry and citation info
 ```bibtex
+@misc {beeching2023stackllama,
+	author       = { Edward Beeching and
+                     Younes Belkada and
+                     Kashif Rasul and
+                     Lewis Tunstall and
+                     Leandro von Werra and
+                     Nazneen Rajani and
+                     Nathan Lambert
+                   },
+	title        = { StackLLaMa: An RL Fine-tuned LLaMa Model for Stack Exchange Question and Answering },
+	year         = 2023,
+	url          = { https://huggingface.co/trl-lib/llama-7b-se-rm-peft },
+	publisher    = { Hugging Face Blog }
 }
 ```