RyanYr
/

bt-rm-llama3.1-hendrydong-preference_700K

Text Classification

Generated from Trainer

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

RyanYr commited on Aug 8

Commit

1a438c2

•

1 Parent(s): 713e875

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -11,7 +11,7 @@ should probably proofread and complete it, then remove this comment. -->
 # bt-rm
-This model was trained from scratch on an unknown dataset.
 ## Model description

 # bt-rm
+This model was trained from LLaMA 3.1 8B Instruct with dataset `hendrydong/preference_700K` (Preprocessed dataset `RyanYr/preference_700K_llama31_tokenized`). Training script is https://github.com/yurun-yuan/RLHF-Reward-Modeling/blob/4b827117dc9a85062c396eb62200b48e6dbfd596/bradley-terry-rm/llama3_rm.py
 ## Model description