Salesforce
/

LLaMA-3-8B-SFR-SFT-R

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

LLaMA-3-8B-SFR-SFT-R / README.md

bpucla's picture

Update README.md

91d91cc verified 4 months ago

|

history blame contribute delete

No virus

834 Bytes

	---
	license: llama3
	---
	# LLaMA-3-8B-SFR-SFT-R
	This is the SFT model for Salesforce/SFR-Iterative-DPO-LLaMA-3-8B-R.

	## Model Releases
	- [SFT model](https://huggingface.co/Salesforce/LLaMA-3-8B-SFR-SFT-R)
	- [Reward model](https://huggingface.co/Salesforce/LLaMA-3-8B-SFR-RM-R)
	- [RLHF model](https://huggingface.co/Salesforce/LLaMA-3-8B-SFR-Iterative-DPO-R)


	## Citation
	Please cite our techical report if you find our model is useful for your research or product.

	```bibtex
	@misc{dong2024rlhf,
	title={RLHF Workflow: From Reward Modeling to Online RLHF},
	author={Hanze Dong and Wei Xiong and Bo Pang and Haoxiang Wang and Han Zhao and Yingbo Zhou and Nan Jiang and Doyen Sahoo and Caiming Xiong and Tong Zhang},
	year={2024},
	eprint={2405.07863},
	archivePrefix={arXiv},
	primaryClass={cs.LG}
	}
	```