amirabdullah19852020
/

pythia-70m_sentiment_reward

Reinforcement Learning

text-generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

amirabdullah19852020 commited on Feb 10

Commit

f87bf24

•

1 Parent(s): 91e805d

Update README.md

Files changed (1) hide show

README.md +1 -0

README.md CHANGED Viewed

@@ -10,6 +10,7 @@ tags:
 This is a [TRL language model](https://github.com/huggingface/trl) that has been fine-tuned with reinforcement learning to
  guide the model outputs according to a value, function, or human feedback. The model can be used for text generation.
 ## Usage

 This is a [TRL language model](https://github.com/huggingface/trl) that has been fine-tuned with reinforcement learning to
  guide the model outputs according to a value, function, or human feedback. The model can be used for text generation.
+ This was used as a test model in the reward interpretability study at https://arxiv.org/abs/2310.08164.
 ## Usage