allenai
/

tulu-v2.5-ppo-13b-uf-mean-70b-mix-rm-value

Token Classification

text-generation-inference

Model card Files Files and versions

hamishivi commited on Jun 12, 2024

Commit

8cfec27

·

verified ·

1 Parent(s): c4c63ca

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -18,7 +18,7 @@ license: apache-2.0
 Tulu is a series of language models that are trained to act as helpful assistants.
 Tulu V2.5 is a series of models trained using DPO and PPO starting from the [Tulu 2 suite](https://huggingface.co/collections/allenai/tulu-v2-suite-6551b56e743e6349aab45101).
-This is a **value** model produced during the PPO training of [this](https://huggingface.co/allenai/tulu-v2.5-70b-preference-mix-rm) model.
 We release the value model as it may provide a good starting point for additional research or improved decoding with our released PPO models.
 For more details, read the paper:

 Tulu is a series of language models that are trained to act as helpful assistants.
 Tulu V2.5 is a series of models trained using DPO and PPO starting from the [Tulu 2 suite](https://huggingface.co/collections/allenai/tulu-v2-suite-6551b56e743e6349aab45101).
+This is a **value** model produced during the PPO training of [this](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean-70b-mix-rm) model.
 We release the value model as it may provide a good starting point for additional research or improved decoding with our released PPO models.
 For more details, read the paper: