--- license: llama3 pipeline_tag: text-generation base_model: Salesforce/LLaMA-3-8B-SFR-SFT-R --- # LLaMA-3-8B-SFR-SFT-R-GGUF This is quzntized version of [Salesforce/LLaMA-3-8B-SFR-SFT-R](https://huggingface.co/Salesforce/LLaMA-3-8B-SFR-SFT-R) created using llama.cpp # Model Description This is the SFT model for Salesforce/SFR-Iterative-DPO-LLaMA-3-8B-R. ## Model Releases - [SFT model](https://huggingface.co/Salesforce/LLaMA-3-8B-SFR-SFT-R) - [Reward model](https://huggingface.co/Salesforce/LLaMA-3-8B-SFR-RM-R) - [RLHF model](https://huggingface.co/Salesforce/LLaMA-3-8B-SFR-Iterative-DPO-R) ## Original Model Citation Please cite our techical report if you find our model is useful for your research or product. ```bibtex @misc{dong2024rlhf, title={RLHF Workflow: From Reward Modeling to Online RLHF}, author={Hanze Dong and Wei Xiong and Bo Pang and Haoxiang Wang and Han Zhao and Yingbo Zhou and Nan Jiang and Doyen Sahoo and Caiming Xiong and Tong Zhang}, year={2024}, eprint={2405.07863}, archivePrefix={arXiv}, primaryClass={cs.LG} } ```