Text Generation
Transformers
Safetensors
llama
conversational
text-generation-inference
Inference Endpoints
ArkaAbacus commited on
Commit
a840a3d
1 Parent(s): 2a23d55

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +17 -0
README.md CHANGED
@@ -28,6 +28,23 @@ The model outperforms Llama-3-70B-Instruct substantially, and is on par with GPT
28
 
29
  ## Evaluation
30
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
31
  ### MT-Bench
32
 
33
  ```
 
28
 
29
  ## Evaluation
30
 
31
+ ### Arena-Hard
32
+
33
+ Score vs selected others (sourced from: (https://lmsys.org/blog/2024-04-19-arena-hard/#full-leaderboard-with-gpt-4-turbo-as-judge))
34
+
35
+ | Model | Score | 95% Confidence Interval | Average Tokens |
36
+ | :---- | ---------: | ----------: | ------: |
37
+ | GPT-4-Turbo-2024-04-09 | 82.6 | (-1.8, 1.6) | 662 |
38
+ | Claude-3-Opus-20240229 | 60.4 | (-3.3, 2.4) | 541 |
39
+ | **Smaug-Llama-3-70B-Instruct Score** | 56.7 | (-2.2, 2.6) | 661 |
40
+ | Llama-3-70B-Instruct | 41.1 | (-2.5, 2.4) | 583 |
41
+ | Mistral-Large-2402 | 37.7 | (-1.9, 2.6) | 400 |
42
+ | Mixtral-8x22B-Instruct-v0.1 | 36.4 | (-2.7, 2.9) | 430 |
43
+ | Qwen1.5-72B-Chat | 36.1 | (-2.5, 2.2) | 474 |
44
+ | Command-R-Plus | 33.1 | (-2.1, 2.2) | 541 |
45
+ | Mistral-Medium | 31.9 | (-2.3, 2.4) | 485 |
46
+ | GPT-3.5-Turbo-0613 | 24.8 | (-1.6, 2.0) | 401 |
47
+
48
  ### MT-Bench
49
 
50
  ```