radm commited on
Commit
ad2b123
1 Parent(s): 410f08f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +25 -2
README.md CHANGED
@@ -24,7 +24,7 @@ This is a LORA adapter for NousResearch/Meta-Llama-3-70B-Instruct, fine-tuned to
24
  ## Uses
25
 
26
  <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
27
- [More Information Needed]
28
 
29
  ## Training Details
30
 
@@ -51,8 +51,31 @@ Datasets:
51
 
52
  ### Results
53
 
54
- [More Information Needed]
55
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
56
 
57
 
58
  ## Hardware
 
24
  ## Uses
25
 
26
  <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
27
+ Use repository (https://github.com/radaevm/arena-hard-local) for evaluate with local judge model.
28
 
29
  ## Training Details
30
 
 
51
 
52
  ### Results
53
 
 
54
 
55
+ #### Llama-3-70B-Instruct-GPTQ as judge:
56
+ ```console
57
+ Llama-3-Instruct-8B-SimPO | score: 78.3 | 95% CI: (-1.5, 1.2) | average #tokens: 545
58
+ SELM-Llama-3-8B-Instruct-iter-3 | score: 72.8 | 95% CI: (-2.1, 1.4) | average #tokens: 606
59
+ Meta-Llama-3-8B-Instruct-f16 | score: 65.3 | 95% CI: (-1.8, 2.1) | average #tokens: 560
60
+ suzume-llama-3-8B-multilingual-orpo-borda-half | score: 63.5 | 95% CI: (-1.6, 2.1) | average #tokens: 978
61
+ Phi-3-medium-128k-instruct | score: 50.0 | 95% CI: (0.0, 0.0) | average #tokens: 801
62
+ suzume-llama-3-8B-multilingual | score: 48.1 | 95% CI: (-2.2, 1.8) | average #tokens: 767
63
+ aya-23-8B | score: 48.0 | 95% CI: (-2.0, 2.1) | average #tokens: 834
64
+ Vikhr-7B-instruct_0.5 | score: 19.6 | 95% CI: (-1.3, 1.5) | average #tokens: 794
65
+ alpindale_gemma-2b-it | score: 11.2 | 95% CI: (-1.0, 0.8) | average #tokens: 425
66
+ ```
67
+ #### Llama-3-70B-Instruct-AH-AWQ as judge:
68
+ ```console
69
+ Llama-3-Instruct-8B-SimPO | score: 83.8 | 95% CI: (-1.4, 1.3) | average #tokens: 545
70
+ SELM-Llama-3-8B-Instruct-iter-3 | score: 78.8 | 95% CI: (-1.7, 1.9) | average #tokens: 606
71
+ suzume-llama-3-8B-multilingual-orpo-borda-half | score: 71.8 | 95% CI: (-1.7, 2.4) | average #tokens: 978
72
+ Meta-Llama-3-8B-Instruct-f16 | score: 69.8 | 95% CI: (-1.9, 1.7) | average #tokens: 560
73
+ suzume-llama-3-8B-multilingual | score: 54.0 | 95% CI: (-2.1, 2.1) | average #tokens: 767
74
+ aya-23-8B | score: 50.4 | 95% CI: (-1.7, 1.7) | average #tokens: 834
75
+ Phi-3-medium-128k-instruct | score: 50.0 | 95% CI: (0.0, 0.0) | average #tokens: 801
76
+ Vikhr-7B-instruct_0.5 | score: 14.2 | 95% CI: (-1.3, 1.0) | average #tokens: 794
77
+ alpindale_gemma-2b-it | score: 7.9 | 95% CI: (-0.9, 0.8) | average #tokens: 425
78
+ ```
79
 
80
 
81
  ## Hardware