kaist-ai
/

janus-rm-7b

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Seongyun commited on May 28

Commit

8b1bd5b

•

1 Parent(s): b3123bf

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -17,7 +17,7 @@ library_name: transformers
 # TL; DR
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6550c4f27bbfce1878f5f280/vrQl8D8FV3vqUJYbPgsiG.png)
-Janus is a model trained using [Mistral-7B-v0.2](https://huggingface.co/mistral-community/Mistral-7B-v0.2) as its base model. Janus has been trained on [Multifaceted Collection](https://huggingface.co/datasets/kaist-ai/Multifaceted-Collection-SFT), a preference dataset containing 192k unique system messages for aligning LLMs to diverse human preferences. Janus not only excels at generating personalized responses that cater to various human preferences but is also adept at producing responses that are generally preferred for being helpful and harmless.
 # Model Details
 Janus-RM-7B is a reward model created by training Janus with Multifaceted-Collection-RM. Janus-RM-7B generates rewards when provided with various system messages and instructions, along with the personalized responses generated in accordance with these. This can be utilized to perform tasks such as PPO and best-of-n sampling.

 # TL; DR
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6550c4f27bbfce1878f5f280/vrQl8D8FV3vqUJYbPgsiG.png)
+Janus is a model trained using [Mistral-7B-v0.2](https://huggingface.co/mistral-community/Mistral-7B-v0.2) as its base model. Janus has been trained on [Multifaceted Collection](https://huggingface.co/datasets/kaist-ai/Multifaceted-Collection-SFT), a preference dataset containing 196k unique system messages for aligning LLMs to diverse human preferences. Janus not only excels at generating personalized responses that cater to various human preferences but is also adept at producing responses that are generally preferred for being helpful and harmless.
 # Model Details
 Janus-RM-7B is a reward model created by training Janus with Multifaceted-Collection-RM. Janus-RM-7B generates rewards when provided with various system messages and instructions, along with the personalized responses generated in accordance with these. This can be utilized to perform tasks such as PPO and best-of-n sampling.