Transformers
PyTorch
English
trl
rlhf
natolambert commited on
Commit
e7487f2
1 Parent(s): 51658cd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -6
README.md CHANGED
@@ -14,12 +14,15 @@ datasets:
14
  ![pull_figure](https://huggingface.co/datasets/trl-internal-testing/example-images/resolve/main/images/stack-llama.png)
15
 
16
  # Llama-se-rl-peft
17
- Adapter weights of an RL fine-tuned model based on LLaMa. Authored by Edward Beeching, Younes Belkada, Kashif Rasul, Lewis Tunstall and Leandro von Werra.
18
  For more info check out the [blog post](https://huggingface.co/blog/stackllama) and [github example](https://github.com/lvwerra/trl/tree/main/examples/stack_llama/scripts).
19
 
20
 
21
  ## Model Description
22
- **Llama-se-rl** is a Llama-based model that has been first fine-tuned on the Stack Exchange dataset and then RL fine-tuned using a Stack Exchange Reward Model. This dataset consists of questions and answers from various domains in Stack Exchange, such as programming, mathematics, physics, and more. The model is designed to generate human-like responses to questions in these domains. The model has been training to respond to prompts with the following template:
 
 
 
23
 
24
  ```
25
  Question: <Query>
@@ -45,9 +48,19 @@ Additionally, the model may generate answers that are incorrect or misleading du
45
  ## BibTeX entry and citation info
46
 
47
  ```bibtex
48
- @misc{beeching2023llama,
49
- title={StackLLaMa: An RL Fine-tuned LLaMa Model for Stack Exchange Question and Answering},
50
- author={Beeching, Edward and Belkada, Younes and Rasul, Kashif and Tunstall, Lewis and von Werra, Leandro},
51
- year={2023}
 
 
 
 
 
 
 
 
 
 
52
  }
53
  ```
 
14
  ![pull_figure](https://huggingface.co/datasets/trl-internal-testing/example-images/resolve/main/images/stack-llama.png)
15
 
16
  # Llama-se-rl-peft
17
+ Adapter weights of an RL fine-tuned model based on LLaMA (see Meta's LLaMA release for the original LLaMA model).
18
  For more info check out the [blog post](https://huggingface.co/blog/stackllama) and [github example](https://github.com/lvwerra/trl/tree/main/examples/stack_llama/scripts).
19
 
20
 
21
  ## Model Description
22
+ **Llama-se-rl** is a Llama-based model that has been first fine-tuned on the Stack Exchange dataset and then RL fine-tuned using a Stack Exchange Reward Model.
23
+ This dataset consists of questions and answers from various domains in Stack Exchange, such as programming, mathematics, physics, and more.
24
+ The model is designed to generate human-like responses to questions in these domains.
25
+ The model has been training to respond to prompts with the following template:
26
 
27
  ```
28
  Question: <Query>
 
48
  ## BibTeX entry and citation info
49
 
50
  ```bibtex
51
+ @misc {beeching2023stackllama,
52
+ author = { Edward Beeching and
53
+ Younes Belkada and
54
+ Kashif Rasul and
55
+ Lewis Tunstall and
56
+ Leandro von Werra and
57
+ Nazneen Rajani and
58
+ Nathan Lambert
59
+ },
60
+ title = { StackLLaMa: An RL Fine-tuned LLaMa Model for Stack Exchange Question and Answering },
61
+ year = 2023,
62
+ url = { https://huggingface.co/trl-lib/llama-7b-se-rl-peft },
63
+ doi = { 10.57967/hf/0513 },
64
+ publisher = { Hugging Face }
65
  }
66
  ```