---
library_name: transformers
tags:
- llm
- Large Language Model
- llama3
- ORPO
- ORPO β
license: apache-2.0
datasets:
- heegyu/hh-rlhf-ko
language:
- ko
---

# Model Card for llama3-8b-instruct-orpo-ko

## Model Summary

This model is a fine-tuned version of the meta-llama/Meta-Llama-3-8B-Instruct using the [odds ratio preference optimization (ORPO)](https://arxiv.org/abs/2403.07691). 

It has been trained to perform NLP tasks in Korean.

## Model Details

### Model Description

- **Developed by:** Sungjoo Byun (Grace Byun)
- **Language(s) (NLP):** Korean
- **License:** Apache 2.0
- **Finetuned from model:** meta-llama/Meta-Llama-3-8B-Instruct

## Training Details

### Training Data

The model was trained using the dataset [heegyu/hh-rlhf-ko](https://huggingface.co/datasets/heegyu/hh-rlhf-ko). We appreciate heegyu for sharing this valuable resource.

### Training Procedure

We applied ORPO β to llama3-8b-instruct. The training was conducted on an A100 GPU with 80GB of memory.

## How to Get Started with the Model

Use the code below to get started with the model:

```python
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("SungJoo/llama3-8b-instruct-orpo-ko")
model = AutoModelForCausalLM.from_pretrained("SungJoo/llama3-8b-instruct-orpo-ko")
```


## Citations

Please cite the ORPO paper and our model as follows:

```bibtex
@misc{hong2024orpo,
      title={ORPO: Monolithic Preference Optimization without Reference Model}, 
      author={Jiwoo Hong and Noah Lee and James Thorne},
      year={2024},
      eprint={2403.07691},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
```
```bibtex
@misc{byun,
  author = {Sungjoo Byun},
  title = {llama3-8b-orpo-ko},
  year = {2024},
  publisher = {Hugging Face},
  journal = {Hugging Face repository},
  howpublished = {\url{https://huggingface.co/SungJoo/llama3-8b-instruct-orpo-ko}}
}
```

## Contact

For any questions or issues, please contact byunsj@snu.ac.kr.