leaderboard-pr-bot commited on
Commit
4c9c144
1 Parent(s): f3c5e13

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +123 -7
README.md CHANGED
@@ -1,4 +1,10 @@
1
  ---
 
 
 
 
 
 
2
  base_model:
3
  - aihub-app/zyte-1B
4
  - TinyLlama/TinyLlama-1.1B-Chat-v1.0
@@ -6,12 +12,109 @@ base_model:
6
  - sreeramajay/TinyLlama-1.1B-orca-v1.0
7
  - vihangd/DopeyTinyLlama-1.1B-v1
8
  - kevin009/lamatama
9
- tags:
10
- - merge
11
- - llama
12
- license: apache-2.0
13
- language:
14
- - en
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  ---
16
  # TinyLlama Merge
17
 
@@ -38,4 +141,17 @@ The users of this model (hereinafter referred to as "the Model") should be aware
38
 
39
  + Use at Your Own Risk: The Model is provided "as is," and the developers make no representations or warranties of any kind concerning the Model's performance or suitability for any particular purpose. The user assumes full responsibility and risk of loss resulting from using the Model.
40
 
41
- By using the Model, users acknowledge and agree to the terms stated in this disclaimer. This disclaimer is subject to change without notice, and the latest version can be found on the Model's Hugging Face page.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ tags:
6
+ - merge
7
+ - llama
8
  base_model:
9
  - aihub-app/zyte-1B
10
  - TinyLlama/TinyLlama-1.1B-Chat-v1.0
 
12
  - sreeramajay/TinyLlama-1.1B-orca-v1.0
13
  - vihangd/DopeyTinyLlama-1.1B-v1
14
  - kevin009/lamatama
15
+ model-index:
16
+ - name: tinyllama-dare
17
+ results:
18
+ - task:
19
+ type: text-generation
20
+ name: Text Generation
21
+ dataset:
22
+ name: AI2 Reasoning Challenge (25-Shot)
23
+ type: ai2_arc
24
+ config: ARC-Challenge
25
+ split: test
26
+ args:
27
+ num_few_shot: 25
28
+ metrics:
29
+ - type: acc_norm
30
+ value: 37.29
31
+ name: normalized accuracy
32
+ source:
33
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=andrijdavid/tinyllama-dare
34
+ name: Open LLM Leaderboard
35
+ - task:
36
+ type: text-generation
37
+ name: Text Generation
38
+ dataset:
39
+ name: HellaSwag (10-Shot)
40
+ type: hellaswag
41
+ split: validation
42
+ args:
43
+ num_few_shot: 10
44
+ metrics:
45
+ - type: acc_norm
46
+ value: 62.78
47
+ name: normalized accuracy
48
+ source:
49
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=andrijdavid/tinyllama-dare
50
+ name: Open LLM Leaderboard
51
+ - task:
52
+ type: text-generation
53
+ name: Text Generation
54
+ dataset:
55
+ name: MMLU (5-Shot)
56
+ type: cais/mmlu
57
+ config: all
58
+ split: test
59
+ args:
60
+ num_few_shot: 5
61
+ metrics:
62
+ - type: acc
63
+ value: 25.2
64
+ name: accuracy
65
+ source:
66
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=andrijdavid/tinyllama-dare
67
+ name: Open LLM Leaderboard
68
+ - task:
69
+ type: text-generation
70
+ name: Text Generation
71
+ dataset:
72
+ name: TruthfulQA (0-shot)
73
+ type: truthful_qa
74
+ config: multiple_choice
75
+ split: validation
76
+ args:
77
+ num_few_shot: 0
78
+ metrics:
79
+ - type: mc2
80
+ value: 39.01
81
+ source:
82
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=andrijdavid/tinyllama-dare
83
+ name: Open LLM Leaderboard
84
+ - task:
85
+ type: text-generation
86
+ name: Text Generation
87
+ dataset:
88
+ name: Winogrande (5-shot)
89
+ type: winogrande
90
+ config: winogrande_xl
91
+ split: validation
92
+ args:
93
+ num_few_shot: 5
94
+ metrics:
95
+ - type: acc
96
+ value: 65.9
97
+ name: accuracy
98
+ source:
99
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=andrijdavid/tinyllama-dare
100
+ name: Open LLM Leaderboard
101
+ - task:
102
+ type: text-generation
103
+ name: Text Generation
104
+ dataset:
105
+ name: GSM8k (5-shot)
106
+ type: gsm8k
107
+ config: main
108
+ split: test
109
+ args:
110
+ num_few_shot: 5
111
+ metrics:
112
+ - type: acc
113
+ value: 1.67
114
+ name: accuracy
115
+ source:
116
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=andrijdavid/tinyllama-dare
117
+ name: Open LLM Leaderboard
118
  ---
119
  # TinyLlama Merge
120
 
 
141
 
142
  + Use at Your Own Risk: The Model is provided "as is," and the developers make no representations or warranties of any kind concerning the Model's performance or suitability for any particular purpose. The user assumes full responsibility and risk of loss resulting from using the Model.
143
 
144
+ By using the Model, users acknowledge and agree to the terms stated in this disclaimer. This disclaimer is subject to change without notice, and the latest version can be found on the Model's Hugging Face page.
145
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
146
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_andrijdavid__tinyllama-dare)
147
+
148
+ | Metric |Value|
149
+ |---------------------------------|----:|
150
+ |Avg. |38.64|
151
+ |AI2 Reasoning Challenge (25-Shot)|37.29|
152
+ |HellaSwag (10-Shot) |62.78|
153
+ |MMLU (5-Shot) |25.20|
154
+ |TruthfulQA (0-shot) |39.01|
155
+ |Winogrande (5-shot) |65.90|
156
+ |GSM8k (5-shot) | 1.67|
157
+