amirabdullah19852020
commited on
Commit
•
f87bf24
1
Parent(s):
91e805d
Update README.md
Browse files
README.md
CHANGED
@@ -10,6 +10,7 @@ tags:
|
|
10 |
|
11 |
This is a [TRL language model](https://github.com/huggingface/trl) that has been fine-tuned with reinforcement learning to
|
12 |
guide the model outputs according to a value, function, or human feedback. The model can be used for text generation.
|
|
|
13 |
|
14 |
## Usage
|
15 |
|
|
|
10 |
|
11 |
This is a [TRL language model](https://github.com/huggingface/trl) that has been fine-tuned with reinforcement learning to
|
12 |
guide the model outputs according to a value, function, or human feedback. The model can be used for text generation.
|
13 |
+
This was used as a test model in the reward interpretability study at https://arxiv.org/abs/2310.08164.
|
14 |
|
15 |
## Usage
|
16 |
|