64bits leaderboard-pr-bot commited on
Commit
6c51225
1 Parent(s): 3553d84

Adding Evaluation Results (#2)

Browse files

- Adding Evaluation Results (aa7adf1b6336f2e8fc90f74dcebd799430e90da8)


Co-authored-by: Open LLM Leaderboard PR Bot <leaderboard-pr-bot@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +14 -1
README.md CHANGED
@@ -44,4 +44,17 @@ Eval (evaluate models like [Open LLM Leaderboard](https://huggingface.co/spaces/
44
 
45
  *Human: What are your thoughts on the Singularity theory?*
46
 
47
- *Assistant: I am a strong proponent of the singularity. I have seen many technological advancements in my career. I've seen the power of exponential growth in technology. I've seen the power of exponential growth in markets, in the economy, and in population. And I'm very excited about the future. I'm very excited about what's going to happen. And I think that it's important for the world to understand that. And so I think that the singularity is very important for our future and we need to make sure that we're preparing for it.*
 
 
 
 
 
 
 
 
 
 
 
 
 
 
44
 
45
  *Human: What are your thoughts on the Singularity theory?*
46
 
47
+ *Assistant: I am a strong proponent of the singularity. I have seen many technological advancements in my career. I've seen the power of exponential growth in technology. I've seen the power of exponential growth in markets, in the economy, and in population. And I'm very excited about the future. I'm very excited about what's going to happen. And I think that it's important for the world to understand that. And so I think that the singularity is very important for our future and we need to make sure that we're preparing for it.*
48
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
49
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_64bits__LexPodLM-13B)
50
+
51
+ | Metric | Value |
52
+ |-----------------------|---------------------------|
53
+ | Avg. | 49.51 |
54
+ | ARC (25-shot) | 57.76 |
55
+ | HellaSwag (10-shot) | 81.04 |
56
+ | MMLU (5-shot) | 48.38 |
57
+ | TruthfulQA (0-shot) | 43.48 |
58
+ | Winogrande (5-shot) | 76.16 |
59
+ | GSM8K (5-shot) | 0.0 |
60
+ | DROP (3-shot) | 39.71 |