abacusai
/

Smaug-Llama-3-70B-Instruct

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

ArkaAbacus commited on May 17

Commit

a840a3d

•

1 Parent(s): 2a23d55

Update README.md

Files changed (1) hide show

README.md +17 -0

README.md CHANGED Viewed

@@ -28,6 +28,23 @@ The model outperforms Llama-3-70B-Instruct substantially, and is on par with GPT
 ## Evaluation
 ### MT-Bench
 ```

 ## Evaluation
+### Arena-Hard
+Score vs selected others (sourced from: (https://lmsys.org/blog/2024-04-19-arena-hard/#full-leaderboard-with-gpt-4-turbo-as-judge))
+| Model | Score | 95% Confidence Interval | Average Tokens |
+| :---- | ---------: | ----------: | ------: |
+| GPT-4-Turbo-2024-04-09 | 82.6  | (-1.8, 1.6)  | 662 |
+| Claude-3-Opus-20240229 | 60.4  | (-3.3, 2.4)  | 541 |
+| **Smaug-Llama-3-70B-Instruct Score** | 56.7  | (-2.2, 2.6)  | 661 |
+| Llama-3-70B-Instruct | 41.1  | (-2.5, 2.4)  | 583 |
+| Mistral-Large-2402 | 37.7 | (-1.9, 2.6)  | 400 |
+| Mixtral-8x22B-Instruct-v0.1 | 36.4  | (-2.7, 2.9)  | 430 |
+| Qwen1.5-72B-Chat | 36.1 | (-2.5, 2.2)  | 474 |
+| Command-R-Plus | 33.1 | (-2.1, 2.2)  | 541 |
+| Mistral-Medium | 31.9  | (-2.3, 2.4)  | 485 |
+| GPT-3.5-Turbo-0613 | 24.8 | (-1.6, 2.0)  | 401 |
 ### MT-Bench
 ```