mlabonne commited on
Commit
8131e42
β€’
1 Parent(s): 54b5445

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +18 -7
README.md CHANGED
@@ -6,6 +6,8 @@ library_name: transformers
6
  tags:
7
  - orpo
8
  - llama 3
 
 
9
  datasets:
10
  - mlabonne/orpo-dpo-mix-40k
11
  ---
@@ -14,12 +16,14 @@ datasets:
14
 
15
  ![](https://i.imgur.com/ZHwzQvI.png)
16
 
17
- This is a quick fine-tune of [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) on 1k samples of [mlabonne/orpo-dpo-mix-40k](https://huggingface.co/datasets/mlabonne/orpo-dpo-mix-40k) created for [this article](https://huggingface.co/blog/mlabonne/orpo-llama-3).
18
-
19
- It's not very good at the moment (it's the sassiest model ever), but I'm currently training a version on the entire dataset.
20
 
21
  **Try the demo**: https://huggingface.co/spaces/mlabonne/OrpoLlama-3-8B
22
 
 
 
 
 
23
  ## πŸ† Evaluation
24
 
25
  ### Nous
@@ -28,15 +32,22 @@ Evaluation performed using [LLM AutoEval](https://github.com/mlabonne/llm-autoev
28
 
29
  | Model | Average | AGIEval | GPT4All | TruthfulQA | Bigbench |
30
  | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------: | --------: | --------: | ---------: | --------: |
31
- | [teknium/OpenHermes-2.5-Mistral-7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B) [πŸ“„](https://gist.github.com/mlabonne/88b21dd9698ffed75d6163ebdc2f6cc8) | 52.42 | 42.75 | 72.99 | 52.99 | 40.94 |
32
  | [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) [πŸ“„](https://gist.github.com/mlabonne/8329284d86035e6019edb11eb0933628) | 51.34 | 41.22 | 69.86 | 51.65 | 42.64 |
33
- | [mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1) [πŸ“„](https://gist.github.com/mlabonne/7a0446c3d30dfce72834ef780491c4b2) | 49.15 | 33.36 | 67.87 | 55.89 | 39.48 |
34
- | [**mlabonne/OrpoLlama-3-8B**](https://huggingface.co/mlabonne/OrpoLlama-3-8B) [πŸ“„](https://gist.github.com/mlabonne/f41dad371d1781d0434a4672fd6f0b82) | **46.76** | **31.56** | **70.19** | **48.11** | **37.17** |
35
  | [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) [πŸ“„](https://gist.github.com/mlabonne/616b6245137a9cfc4ea80e4c6e55d847) | 45.42 | 31.1 | 69.95 | 43.91 | 36.7 |
36
 
 
 
 
 
 
 
37
  ## πŸ“ˆ Training curves
38
 
39
- ![](https://i.imgur.com/r78hGrl.png)
 
 
40
 
41
  ## πŸ’» Usage
42
 
 
6
  tags:
7
  - orpo
8
  - llama 3
9
+ - rlhf
10
+ - sft
11
  datasets:
12
  - mlabonne/orpo-dpo-mix-40k
13
  ---
 
16
 
17
  ![](https://i.imgur.com/ZHwzQvI.png)
18
 
19
+ This is an ORPO fine-tune of [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) on 1k samples of [mlabonne/orpo-dpo-mix-40k](https://huggingface.co/datasets/mlabonne/orpo-dpo-mix-40k) created for [this article](https://huggingface.co/blog/mlabonne/orpo-llama-3).
 
 
20
 
21
  **Try the demo**: https://huggingface.co/spaces/mlabonne/OrpoLlama-3-8B
22
 
23
+ ## πŸ”Ž Application
24
+
25
+ This model uses a context window of 8k. It was trained with the ChatML template.
26
+
27
  ## πŸ† Evaluation
28
 
29
  ### Nous
 
32
 
33
  | Model | Average | AGIEval | GPT4All | TruthfulQA | Bigbench |
34
  | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------: | --------: | --------: | ---------: | --------: |
 
35
  | [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) [πŸ“„](https://gist.github.com/mlabonne/8329284d86035e6019edb11eb0933628) | 51.34 | 41.22 | 69.86 | 51.65 | 42.64 |
36
+ | [**mlabonne/OrpoLlama-3-8B**](https://huggingface.co/mlabonne/OrpoLlama-3-8B) [πŸ“„](https://gist.github.com/mlabonne/22896a1ae164859931cc8f4858c97f6f) | **48.63** | **34.17** | **70.59** | **52.39** | **37.36** |
37
+ | [mlabonne/OrpoLlama-3-8B-1k](https://huggingface.co/mlabonne/OrpoLlama-3-8B) [πŸ“„](https://gist.github.com/mlabonne/f41dad371d1781d0434a4672fd6f0b82) | 46.76 | 31.56 | 70.19 | 48.11 | 37.17 |
38
  | [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) [πŸ“„](https://gist.github.com/mlabonne/616b6245137a9cfc4ea80e4c6e55d847) | 45.42 | 31.1 | 69.95 | 43.91 | 36.7 |
39
 
40
+ `mlabonne/OrpoLlama-3-8B-1k` corresponds to a version of this model trained on 1K samples (you can see the parameters in [this article](https://huggingface.co/blog/mlabonne/orpo-llama-3)).
41
+
42
+ ### Open LLM Leaderboard
43
+
44
+ TBD.
45
+
46
  ## πŸ“ˆ Training curves
47
 
48
+ You can find the experiment on W&B at [this address](https://wandb.ai/mlabonne/DPO/runs/vxnmq24z/workspace?nw=nwusermlabonne).
49
+
50
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/zm71HyZiG96YY1GUtpfHq.png)
51
 
52
  ## πŸ’» Usage
53