Transformers
PyTorch
English
trl
rlhf
edbeeching HF staff commited on
Commit
0452f71
1 Parent(s): cfddde7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +20 -24
README.md CHANGED
@@ -1,42 +1,38 @@
1
  ---
2
  license: apache-2.0
 
 
3
  tags:
4
  - trl
5
  - transformers
6
  - reinforcement-learning
7
  ---
8
 
9
- # TRL Model
 
10
 
11
- This is a [TRL language model](https://github.com/lvwerra/trl) that has been fine-tuned with reinforcement learning to
12
- guide the model outputs according to a value, function, or human feedback. The model can be used for text generation.
13
 
14
- ## Usage
 
15
 
16
- To use this model for inference, first install the TRL library:
17
-
18
- ```bash
19
- python -m pip install trl
20
  ```
 
21
 
22
- You can then generate text as follows:
23
-
24
- ```python
25
- from transformers import pipeline
26
-
27
- generator = pipeline("text-generation", model="lvwerra/runs_truncate/step_350")
28
- outputs = generator("Hello, my llama is cute")
29
  ```
30
 
31
- If you want to use the model for training or to obtain the outputs from the value head, load the model as follows:
 
32
 
33
- ```python
34
- from transformers import AutoTokenizer
35
- from trl import AutoModelForCausalLMWithValueHead
36
 
37
- tokenizer = AutoTokenizer.from_pretrained("lvwerra/runs_truncate/step_350")
38
- model = AutoModelForCausalLMWithValueHead.from_pretrained("lvwerra/runs_truncate/step_350")
39
 
40
- inputs = tokenizer("Hello, my llama is cute", return_tensors="pt")
41
- outputs = model(**inputs, labels=inputs["input_ids"])
42
- ```
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ language:
4
+ - en
5
  tags:
6
  - trl
7
  - transformers
8
  - reinforcement-learning
9
  ---
10
 
11
+ # Llama-se-rl-adapter
12
+ Adapter weights of an RL fine-tuned model based on LLaMa. Authored by Edward Beeching, Younes Belkada, Kashiv Rasul, Lewis Tunstall and Leandro von Werra.
13
 
 
 
14
 
15
+ ## Model Description
16
+ **Llama-se-rl** is a Llama-based model that has been first fine-tuned on the Stack Exchange dataset and then RL fine-tuned using a Stack Exchange Reward Model. This dataset consists of questions and answers from various domains in Stack Exchange, such as programming, mathematics, physics, and more. The model is designed to generate human-like responses to questions in these domains. The model has been training to respond to prompts with the following template:
17
 
 
 
 
 
18
  ```
19
+ Question: <Query>
20
 
21
+ Answer: <Response>
 
 
 
 
 
 
22
  ```
23
 
24
+ ## Intended Uses & Limitations
25
+ **Llama-se-rl** is intended for use in generating responses to questions related to the Stack Exchange dataset. It is suitable for generating answers to questions in the domains covered by the dataset, such as programming, mathematics, and physics. However, the model may not perform well on questions outside these domains or on questions requiring highly specific or technical knowledge.
26
 
27
+ ## Limitations and Bias
28
+ The **Llama-se-rl** model inherits limitations and biases from the Llama model and also those contained in the Stack Exchange dataset. The Stack Exchange dataset may contain biases in terms of the topics it covers and the users who contribute to it. It may not include all possible domains, and the quality of answers may vary. Additionally, the model may generate answers that are incorrect or misleading due to biases in the training data or the inherent limitations of the Llama architecture.
 
29
 
30
+ ## BibTeX entry and citation info
 
31
 
32
+ ```bibtex
33
+ @misc{beeching2023llama,
34
+ title={StackLLaMa: An RL Fine-tuned LLaMa Model for Stack Exchange Question and Answering},
35
+ author={Beeching, Edward and Belkada, Younes and Rasul, Kashiv and Tunstall, Lewis and von Werra, Leandro},
36
+ year={2023}
37
+ }
38
+ ```