metadata
library_name: transformers
tags:
- llm
- Large Language Model
- llama3
- ORPO
- ORPO β
license: apache-2.0
datasets:
- heegyu/hh-rlhf-ko
language:
- ko
Model Card for llama3-8b-instruct-orpo-ko
Model Summary
This model is a fine-tuned version of the meta-llama/Meta-Llama-3-8B-Instruct using the odds ratio preference optimization (ORPO).
It has been trained to perform NLP tasks in Korean.
Model Details
Model Description
- Developed by: Sungjoo Byun (Grace Byun)
- Language(s) (NLP): Korean
- License: Apache 2.0
- Finetuned from model: meta-llama/Meta-Llama-3-8B-Instruct
Training Details
Training Data
The model was trained using the dataset heegyu/hh-rlhf-ko. We appreciate heegyu for sharing this valuable resource.
Training Procedure
We applied ORPO β to llama3-8b-instruct. The training was conducted on an A100 GPU with 80GB of memory.
How to Get Started with the Model
Use the code below to get started with the model:
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("SungJoo/llama3-8b-instruct-orpo-ko")
model = AutoModelForCausalLM.from_pretrained("SungJoo/llama3-8b-instruct-orpo-ko")
Citations
Please cite the ORPO paper and our model as follows:
@misc{hong2024orpo,
title={ORPO: Monolithic Preference Optimization without Reference Model},
author={Jiwoo Hong and Noah Lee and James Thorne},
year={2024},
eprint={2403.07691},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
@misc{byun,
author = {Sungjoo Byun},
title = {llama3-8b-orpo-ko},
year = {2024},
publisher = {Hugging Face},
journal = {Hugging Face repository},
howpublished = {\url{https://huggingface.co/SungJoo/llama3-8b-instruct-orpo-ko}}
}
Model Card Contact
For any questions or issues, please contact byunsj@snu.ac.kr.