SungJoo's picture
Update README.md
fd4d9bc verified
---
library_name: transformers
tags:
- llm
- Large Language Model
- llama3
- ORPO
- ORPO β
license: apache-2.0
datasets:
- heegyu/hh-rlhf-ko
language:
- ko
---
# Model Card for llama3-8b-instruct-orpo-ko
## Model Summary
This model is a fine-tuned version of the meta-llama/Meta-Llama-3-8B-Instruct using the [odds ratio preference optimization (ORPO)](https://arxiv.org/abs/2403.07691).
It has been trained to perform NLP tasks in Korean.
## Model Details
### Model Description
- **Developed by:** Sungjoo Byun (Grace Byun)
- **Language(s) (NLP):** Korean
- **License:** Apache 2.0
- **Finetuned from model:** meta-llama/Meta-Llama-3-8B-Instruct
## Training Details
### Training Data
The model was trained using the dataset [heegyu/hh-rlhf-ko](https://huggingface.co/datasets/heegyu/hh-rlhf-ko). We appreciate heegyu for sharing this valuable resource.
### Training Procedure
We applied ORPO β to llama3-8b-instruct. The training was conducted on an A100 GPU with 80GB of memory.
## How to Get Started with the Model
Use the code below to get started with the model:
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("SungJoo/llama3-8b-instruct-orpo-ko")
model = AutoModelForCausalLM.from_pretrained("SungJoo/llama3-8b-instruct-orpo-ko")
```
## Citations
Please cite the ORPO paper and our model as follows:
```bibtex
@misc{hong2024orpo,
title={ORPO: Monolithic Preference Optimization without Reference Model},
author={Jiwoo Hong and Noah Lee and James Thorne},
year={2024},
eprint={2403.07691},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```
```bibtex
@misc{byun,
author = {Sungjoo Byun},
title = {llama3-8b-orpo-ko},
year = {2024},
publisher = {Hugging Face},
journal = {Hugging Face repository},
howpublished = {\url{https://huggingface.co/SungJoo/llama3-8b-instruct-orpo-ko}}
}
```
## Contact
For any questions or issues, please contact byunsj@snu.ac.kr.