kuotient commited on
Commit
256003e
1 Parent(s): 82185d8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -20,7 +20,7 @@ Alpha-Instruct is our latest language model, developed using 'Evolutionary Model
20
  - [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) (Instruct)
21
  - [Llama-3-Open-Ko-8B](beomi/Llama-3-Open-Ko-8B) (Continual Pretrained)
22
 
23
- To refine and enhance Alpha-Instruct, we utilized a carefully curated high-quality datasets aimed at 'healing' the model's output, significantly boosting its human preference scores. We use [ORPO] (https://arxiv.org/abs/2403.07691) specifically for this "healing" (RLHF) phase. The datasets* used include:
24
  - [Korean-Human-Judgements](https://huggingface.co/datasets/HAERAE-HUB/Korean-Human-Judgements)
25
  - [Orca-Math](https://huggingface.co/datasets/kuotient/orca-math-word-problems-193k-korean)
26
  - [dpo-mix-7k](https://huggingface.co/datasets/argilla/dpo-mix-7k)
 
20
  - [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) (Instruct)
21
  - [Llama-3-Open-Ko-8B](beomi/Llama-3-Open-Ko-8B) (Continual Pretrained)
22
 
23
+ To refine and enhance Alpha-Instruct, we utilized a carefully curated high-quality datasets aimed at 'healing' the model's output, significantly boosting its human preference scores. We use [ORPO](https://arxiv.org/abs/2403.07691) specifically for this "healing" (RLHF) phase. The datasets* used include:
24
  - [Korean-Human-Judgements](https://huggingface.co/datasets/HAERAE-HUB/Korean-Human-Judgements)
25
  - [Orca-Math](https://huggingface.co/datasets/kuotient/orca-math-word-problems-193k-korean)
26
  - [dpo-mix-7k](https://huggingface.co/datasets/argilla/dpo-mix-7k)