Update README.md
Browse files
README.md
CHANGED
|
@@ -360,18 +360,17 @@ This work builds upon and complements:
|
|
| 360 |
If you use this model in your research, please cite:
|
| 361 |
|
| 362 |
```bibtex
|
| 363 |
-
@article{
|
| 364 |
title={Stable Outcome Reward Modeling via Pairwise Preference Learning},
|
| 365 |
author={Mishra, Aklesh},
|
| 366 |
-
journal={arXiv preprint},
|
| 367 |
year={2026},
|
| 368 |
-
note={
|
| 369 |
}
|
| 370 |
```
|
| 371 |
|
| 372 |
## ๐ Resources
|
| 373 |
|
| 374 |
-
- ๐ **Paper**:
|
| 375 |
- ๐พ **Dataset**: [HuggingFace](https://huggingface.co/datasets/LossFunctionLover/orm-pairwise-preference-pairs)
|
| 376 |
|
| 377 |
## ๐ง Contact
|
|
|
|
| 360 |
If you use this model in your research, please cite:
|
| 361 |
|
| 362 |
```bibtex
|
| 363 |
+
@article{mishra2026pairwise-orm,
|
| 364 |
title={Stable Outcome Reward Modeling via Pairwise Preference Learning},
|
| 365 |
author={Mishra, Aklesh},
|
|
|
|
| 366 |
year={2026},
|
| 367 |
+
note={Preprint}
|
| 368 |
}
|
| 369 |
```
|
| 370 |
|
| 371 |
## ๐ Resources
|
| 372 |
|
| 373 |
+
- ๐ **Paper**: Preprint
|
| 374 |
- ๐พ **Dataset**: [HuggingFace](https://huggingface.co/datasets/LossFunctionLover/orm-pairwise-preference-pairs)
|
| 375 |
|
| 376 |
## ๐ง Contact
|