Model Card for llama3-8b-instruct-orpo-ko

Model Summary

This model is a fine-tuned version of the meta-llama/Meta-Llama-3-8B-Instruct using the odds ratio preference optimization (ORPO).

It has been trained to perform NLP tasks in Korean.

Model Details

Model Description

Developed by: Sungjoo Byun (Grace Byun)
Language(s) (NLP): Korean
License: Apache 2.0
Finetuned from model: meta-llama/Meta-Llama-3-8B-Instruct

Training Details

Training Data

The model was trained using the dataset heegyu/hh-rlhf-ko. We appreciate heegyu for sharing this valuable resource.

Training Procedure

We applied ORPO β to llama3-8b-instruct. The training was conducted on an A100 GPU with 80GB of memory.

How to Get Started with the Model

Use the code below to get started with the model:

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("SungJoo/llama3-8b-instruct-orpo-ko")
model = AutoModelForCausalLM.from_pretrained("SungJoo/llama3-8b-instruct-orpo-ko")

Citations

Please cite the ORPO paper and our model as follows:

@misc{hong2024orpo,
      title={ORPO: Monolithic Preference Optimization without Reference Model}, 
      author={Jiwoo Hong and Noah Lee and James Thorne},
      year={2024},
      eprint={2403.07691},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

@misc{byun,
  author = {Sungjoo Byun},
  title = {llama3-8b-orpo-ko},
  year = {2024},
  publisher = {Hugging Face},
  journal = {Hugging Face repository},
  howpublished = {\url{https://huggingface.co/SungJoo/llama3-8b-instruct-orpo-ko}}
}

Contact

For any questions or issues, please contact byunsj@snu.ac.kr.

SungJoo
/

llama3-8b-instruct-orpo-ko