LossFunctionLover
/

pairwise-orm-model

Text Classification

preference-learning

agentic-reasoning

outcome-reward-model

pairwise-preference

Eval Results (legacy)

Model card Files Files and versions

LossFunctionLover commited on 12 days ago

Commit

4812ee0

·

verified ·

1 Parent(s): 6e09cf3

Update README.md

Files changed (1) hide show

README.md +3 -4

README.md CHANGED Viewed

@@ -360,18 +360,17 @@ This work builds upon and complements:
 If you use this model in your research, please cite:
 ```bibtex
-@article{mishra2026orm,
   title={Stable Outcome Reward Modeling via Pairwise Preference Learning},
   author={Mishra, Aklesh},
-  journal={arXiv preprint},
   year={2026},
-  note={Under review}
 }
 ```
 ## 🔗 Resources
-- 📄 **Paper**: Submitted to arXiv (under review)
 - 💾 **Dataset**: [HuggingFace](https://huggingface.co/datasets/LossFunctionLover/orm-pairwise-preference-pairs)
 ## 📧 Contact

 If you use this model in your research, please cite:
 ```bibtex
+@article{mishra2026pairwise-orm,
   title={Stable Outcome Reward Modeling via Pairwise Preference Learning},
   author={Mishra, Aklesh},
   year={2026},
+  note={Preprint}
 }
 ```
 ## 🔗 Resources
+- 📄 **Paper**: Preprint
 - 💾 **Dataset**: [HuggingFace](https://huggingface.co/datasets/LossFunctionLover/orm-pairwise-preference-pairs)
 ## 📧 Contact