alvarobartt HF staff commited on
Commit
b875065
1 Parent(s): 75dfa6c

Add evaluation results (LM Eval Harness, MT-Bench, AlpacaEval)

Browse files

Note that MMLU is missing from https://huggingface.co/datasets/open-llm-leaderboard/results/blob/main/argilla/notus-7b-v1/results_2023-11-29T22-16-51.521321.json, so hasn't been included within the results

Files changed (1) hide show
  1. README.md +134 -3
README.md CHANGED
@@ -1,7 +1,4 @@
1
  ---
2
- model-index:
3
- - name: notus-7b-v1
4
- results: []
5
  datasets:
6
  - argilla/ultrafeedback-binarized-preferences
7
  language:
@@ -15,6 +12,140 @@ tags:
15
  - preference
16
  - ultrafeedback
17
  license: mit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
  ---
19
 
20
  <div align="center">
 
1
  ---
 
 
 
2
  datasets:
3
  - argilla/ultrafeedback-binarized-preferences
4
  language:
 
12
  - preference
13
  - ultrafeedback
14
  license: mit
15
+ model-index:
16
+ - name: notus-7b-v1
17
+ results:
18
+ # AI2 Reasoning Challenge (25-Shot)
19
+ - task:
20
+ type: text-generation
21
+ name: Text Generation
22
+ dataset:
23
+ name: AI2 Reasoning Challenge (25-Shot)
24
+ type: ai2_arc
25
+ config: ARC-Challenge
26
+ split: test
27
+ args:
28
+ num_few_shot: 25
29
+ metrics:
30
+ - type: acc_norm
31
+ name: normalized accuracy
32
+ value: 0.6459044368600683
33
+ source:
34
+ name: Open LLM Leaderboard Results
35
+ url: https://huggingface.co/datasets/open-llm-leaderboard/results/blob/main/argilla/notus-7b-v1/results_2023-11-29T22-16-51.521321.json
36
+ # HellaSwag (10-shot)
37
+ - task:
38
+ type: text-generation
39
+ name: Text Generation
40
+ dataset:
41
+ name: HellaSwag (10-Shot)
42
+ type: hellaswag
43
+ split: validation
44
+ args:
45
+ num_few_shot: 10
46
+ metrics:
47
+ - type: acc_norm
48
+ name: normalized accuracy
49
+ value: 0.8478390758812986
50
+ source:
51
+ name: Open LLM Leaderboard Results
52
+ url: https://huggingface.co/datasets/open-llm-leaderboard/results/blob/main/argilla/notus-7b-v1/results_2023-11-29T22-16-51.521321.json
53
+ # DROP (3-shot)
54
+ - task:
55
+ type: text-generation
56
+ name: Text Generation
57
+ dataset:
58
+ name: Drop (3-Shot)
59
+ type: drop
60
+ split: validation
61
+ args:
62
+ num_few_shot: 3
63
+ metrics:
64
+ - type: f1
65
+ name: f1 score
66
+ value: 0.08913590604026835
67
+ source:
68
+ name: Open LLM Leaderboard Results
69
+ url: https://huggingface.co/datasets/open-llm-leaderboard/results/blob/main/argilla/notus-7b-v1/results_2023-11-29T22-16-51.521321.json
70
+ # TruthfulQA (0-shot)
71
+ - task:
72
+ type: text-generation
73
+ name: Text Generation
74
+ dataset:
75
+ name: TruthfulQA (0-shot)
76
+ type: truthful_qa
77
+ config: multiple_choice
78
+ split: validation
79
+ args:
80
+ num_few_shot: 0
81
+ metrics:
82
+ - type: mc2
83
+ value: 0.5436768358952805
84
+ source:
85
+ name: Open LLM Leaderboard Results
86
+ url: https://huggingface.co/datasets/open-llm-leaderboard/results/blob/main/argilla/notus-7b-v1/results_2023-11-29T22-16-51.521321.json
87
+ # GSM8k (5-shot)
88
+ - task:
89
+ type: text-generation
90
+ name: Text Generation
91
+ dataset:
92
+ name: GSM8k (5-shot)
93
+ type: gsm8k
94
+ config: main
95
+ split: test
96
+ args:
97
+ num_few_shot: 5
98
+ metrics:
99
+ - type: acc
100
+ name: accuracy
101
+ value: 0.1516300227445034
102
+ source:
103
+ name: Open LLM Leaderboard Results
104
+ url: https://huggingface.co/datasets/open-llm-leaderboard/results/blob/main/argilla/notus-7b-v1/results_2023-11-29T22-16-51.521321.json
105
+ # Winogrande (5-shot)
106
+ - task:
107
+ type: text-generation
108
+ name: Text Generation
109
+ dataset:
110
+ name: Winogrande (5-shot)
111
+ type: winogrande
112
+ config: winogrande_xl
113
+ split: validation
114
+ args:
115
+ num_few_shot: 5
116
+ metrics:
117
+ - type: acc
118
+ name: accuracy
119
+ value: 0.7940015785319653
120
+ source:
121
+ name: Open LLM Leaderboard Results
122
+ url: https://huggingface.co/datasets/open-llm-leaderboard/results/blob/main/argilla/notus-7b-v1/results_2023-11-29T22-16-51.521321.json
123
+ # AlpacaEval
124
+ - task:
125
+ type: text-generation
126
+ name: Text Generation
127
+ dataset:
128
+ name: AlpacaEval
129
+ type: tatsu-lab/alpaca_eval
130
+ metrics:
131
+ - type: tatsu-lab/alpaca_eval
132
+ name: win rate
133
+ value: 0.9142
134
+ source:
135
+ url: https://tatsu-lab.github.io/alpaca_eval/
136
+ # MT-Bench
137
+ - task:
138
+ type: text-generation
139
+ name: Text Generation
140
+ dataset:
141
+ name: MT-Bench
142
+ type: unknown
143
+ metrics:
144
+ - type: unknown
145
+ name: score
146
+ value: 7.30
147
+ source:
148
+ url: https://huggingface.co/spaces/lmsys/mt-bench
149
  ---
150
 
151
  <div align="center">