leaderboard-pr-bot commited on
Commit
b5c99d2
·
verified ·
1 Parent(s): cd4d2fa

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +120 -4
README.md CHANGED
@@ -1,19 +1,122 @@
1
  ---
 
 
2
  license: apache-2.0
3
  datasets:
4
  - Skylion007/openwebtext
5
  - JeanKaddour/minipile
6
- language:
7
- - en
8
  pipeline_tag: text-generation
9
  inference:
10
  parameters:
11
- do_sample: True
12
  temperature: 0.5
13
  top_p: 0.5
14
  top_k: 50
15
  max_new_tokens: 250
16
  repetition_penalty: 1.176
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
  ---
18
  A pre-trained language model, based on the Mistral 7B model, has been scaled down to approximately 248 million parameters. This model has been trained on 7,488,000 examples. This model isn't intended for direct use but for fine-tuning on a downstream task.
19
  This model should have a context length of around 32,768 tokens. Safe serialization has been removed due to issues saving model weights.
@@ -34,4 +137,17 @@ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-le
34
  | DROP (3-shot) | 0.74 |
35
 
36
 
37
- The purpose of this model is to prove that trillion-scale datasets are not needed to pretrain a language model. As a result of needing small datasets, this model was pretrained on a single GPU (Titan V).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - en
4
  license: apache-2.0
5
  datasets:
6
  - Skylion007/openwebtext
7
  - JeanKaddour/minipile
 
 
8
  pipeline_tag: text-generation
9
  inference:
10
  parameters:
11
+ do_sample: true
12
  temperature: 0.5
13
  top_p: 0.5
14
  top_k: 50
15
  max_new_tokens: 250
16
  repetition_penalty: 1.176
17
+ model-index:
18
+ - name: TinyMistral-248m
19
+ results:
20
+ - task:
21
+ type: text-generation
22
+ name: Text Generation
23
+ dataset:
24
+ name: AI2 Reasoning Challenge (25-Shot)
25
+ type: ai2_arc
26
+ config: ARC-Challenge
27
+ split: test
28
+ args:
29
+ num_few_shot: 25
30
+ metrics:
31
+ - type: acc_norm
32
+ value: 22.87
33
+ name: normalized accuracy
34
+ source:
35
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Locutusque/TinyMistral-248m
36
+ name: Open LLM Leaderboard
37
+ - task:
38
+ type: text-generation
39
+ name: Text Generation
40
+ dataset:
41
+ name: HellaSwag (10-Shot)
42
+ type: hellaswag
43
+ split: validation
44
+ args:
45
+ num_few_shot: 10
46
+ metrics:
47
+ - type: acc_norm
48
+ value: 28.02
49
+ name: normalized accuracy
50
+ source:
51
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Locutusque/TinyMistral-248m
52
+ name: Open LLM Leaderboard
53
+ - task:
54
+ type: text-generation
55
+ name: Text Generation
56
+ dataset:
57
+ name: MMLU (5-Shot)
58
+ type: cais/mmlu
59
+ config: all
60
+ split: test
61
+ args:
62
+ num_few_shot: 5
63
+ metrics:
64
+ - type: acc
65
+ value: 23.15
66
+ name: accuracy
67
+ source:
68
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Locutusque/TinyMistral-248m
69
+ name: Open LLM Leaderboard
70
+ - task:
71
+ type: text-generation
72
+ name: Text Generation
73
+ dataset:
74
+ name: TruthfulQA (0-shot)
75
+ type: truthful_qa
76
+ config: multiple_choice
77
+ split: validation
78
+ args:
79
+ num_few_shot: 0
80
+ metrics:
81
+ - type: mc2
82
+ value: 42.52
83
+ source:
84
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Locutusque/TinyMistral-248m
85
+ name: Open LLM Leaderboard
86
+ - task:
87
+ type: text-generation
88
+ name: Text Generation
89
+ dataset:
90
+ name: Winogrande (5-shot)
91
+ type: winogrande
92
+ config: winogrande_xl
93
+ split: validation
94
+ args:
95
+ num_few_shot: 5
96
+ metrics:
97
+ - type: acc
98
+ value: 49.8
99
+ name: accuracy
100
+ source:
101
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Locutusque/TinyMistral-248m
102
+ name: Open LLM Leaderboard
103
+ - task:
104
+ type: text-generation
105
+ name: Text Generation
106
+ dataset:
107
+ name: GSM8k (5-shot)
108
+ type: gsm8k
109
+ config: main
110
+ split: test
111
+ args:
112
+ num_few_shot: 5
113
+ metrics:
114
+ - type: acc
115
+ value: 0.0
116
+ name: accuracy
117
+ source:
118
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Locutusque/TinyMistral-248m
119
+ name: Open LLM Leaderboard
120
  ---
121
  A pre-trained language model, based on the Mistral 7B model, has been scaled down to approximately 248 million parameters. This model has been trained on 7,488,000 examples. This model isn't intended for direct use but for fine-tuning on a downstream task.
122
  This model should have a context length of around 32,768 tokens. Safe serialization has been removed due to issues saving model weights.
 
137
  | DROP (3-shot) | 0.74 |
138
 
139
 
140
+ The purpose of this model is to prove that trillion-scale datasets are not needed to pretrain a language model. As a result of needing small datasets, this model was pretrained on a single GPU (Titan V).
141
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
142
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Locutusque__TinyMistral-248m)
143
+
144
+ | Metric |Value|
145
+ |---------------------------------|----:|
146
+ |Avg. |27.73|
147
+ |AI2 Reasoning Challenge (25-Shot)|22.87|
148
+ |HellaSwag (10-Shot) |28.02|
149
+ |MMLU (5-Shot) |23.15|
150
+ |TruthfulQA (0-shot) |42.52|
151
+ |Winogrande (5-shot) |49.80|
152
+ |GSM8k (5-shot) | 0.00|
153
+