Transformers
English
trl
rlhf
natolambert commited on
Commit
7bf36fd
1 Parent(s): a804f8f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -7
README.md CHANGED
@@ -15,12 +15,16 @@ datasets:
15
 
16
 
17
  # Llama-se-rm-peft
18
- Adapter weights of a reward model based on LLaMa. Authored by Edward Beeching, Younes Belkada, Kashif Rasul, Lewis Tunstall and Leandro von Werra.
19
- For more info check out the [blog post]() and [github example]().
 
20
 
21
 
22
  ## Model Description
23
- **Llama-se-rm** is a Llama-based model that has been first fine-tuned on the Stack Exchange dataset and used for reward modeling using a Stack Exchange Data. This dataset consists of questions and answers from various domains in Stack Exchange, such as programming, mathematics, physics, and more. The model is designed to generate human-like responses to questions in these domains. The model has been training to respond to prompts with the following template:
 
 
 
24
 
25
  ```
26
  Question: <Query>
@@ -44,10 +48,20 @@ While this demographic information likely varies by topic, disparities between t
44
  Additionally, the model may generate answers that are incorrect or misleading due to the inherent limitations of the Llama architecture.
45
  ## BibTeX entry and citation info
46
 
 
47
  ```bibtex
48
- @misc{beeching2023llama,
49
- title={StackLLaMa: An RL Fine-tuned LLaMa Model for Stack Exchange Question and Answering},
50
- author={Beeching, Edward and Belkada, Younes and Rasul, Kashif and Tunstall, Lewis and von Werra, Leandro},
51
- year={2023}
 
 
 
 
 
 
 
 
 
52
  }
53
  ```
 
15
 
16
 
17
  # Llama-se-rm-peft
18
+ Adapter weights of a reward model based on LLaMa (see Meta's LLaMA release for the original LLaMA model).
19
+ For more info check out the [blog post](https://huggingface.co/blog/stackllama) and [github example](https://github.com/lvwerra/trl/tree/main/examples/stack_llama/scripts).
20
+
21
 
22
 
23
  ## Model Description
24
+ **Llama-se-rm** is a Llama-based model that has been first fine-tuned on the Stack Exchange dataset and used for reward modeling using a Stack Exchange Data.
25
+ This dataset consists of questions and answers from various domains in Stack Exchange, such as programming, mathematics, physics, and more.
26
+ The model is designed to generate human-like responses to questions in these domains.
27
+ The model has been training to respond to prompts with the following template:
28
 
29
  ```
30
  Question: <Query>
 
48
  Additionally, the model may generate answers that are incorrect or misleading due to the inherent limitations of the Llama architecture.
49
  ## BibTeX entry and citation info
50
 
51
+
52
  ```bibtex
53
+ @misc {beeching2023stackllama,
54
+ author = { Edward Beeching and
55
+ Younes Belkada and
56
+ Kashif Rasul and
57
+ Lewis Tunstall and
58
+ Leandro von Werra and
59
+ Nazneen Rajani and
60
+ Nathan Lambert
61
+ },
62
+ title = { StackLLaMa: An RL Fine-tuned LLaMa Model for Stack Exchange Question and Answering },
63
+ year = 2023,
64
+ url = { https://huggingface.co/trl-lib/llama-7b-se-rm-peft },
65
+ publisher = { Hugging Face Blog }
66
  }
67
  ```