base_model: mlabonne/OrpoLlama-3-8B | |
language: | |
- en | |
license: other | |
library_name: transformers | |
datasets: | |
- mlabonne/orpo-dpo-mix-40k | |
tags: | |
- 4-bit | |
- AWQ | |
- text-generation | |
- autotrain_compatible | |
- endpoints_compatible | |
- orpo | |
- llama 3 | |
- rlhf | |
- sft | |
pipeline_tag: text-generation | |
inference: false | |
quantized_by: Suparious | |
# mlabonne/OrpoLlama-3-8B AWQ | |
- Model creator: [mlabonne](https://huggingface.co/mlabonne) | |
- Original model: [OrpoLlama-3-8B](https://huggingface.co/mlabonne/OrpoLlama-3-8B) | |
![](https://i.imgur.com/ZHwzQvI.png) | |
## Model Summary | |
This is an ORPO fine-tune of [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) on 1k samples of [mlabonne/orpo-dpo-mix-40k](https://huggingface.co/datasets/mlabonne/orpo-dpo-mix-40k) created for [this article](https://huggingface.co/blog/mlabonne/orpo-llama-3). | |
It's a successful fine-tune that follows the ChatML template! | |
**Try the demo**: https://huggingface.co/spaces/mlabonne/OrpoLlama-3-8B | |
## π Application | |
This model uses a context window of 8k. It was trained with the ChatML template. | |
## π Evaluation | |
### Nous | |
OrpoLlama-4-8B outperforms Llama-3-8B-Instruct on the GPT4All and TruthfulQA datasets. | |