Text Generation
Transformers
Safetensors
mistral
conversational
Inference Endpoints
text-generation-inference
leaderboard-pr-bot commited on
Commit
2701afb
1 Parent(s): 1407000

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +118 -2
README.md CHANGED
@@ -1,6 +1,5 @@
1
  ---
2
  license: apache-2.0
3
- base_model: mistralai/mistral-7b-v0.1
4
  datasets:
5
  - ai2_arc
6
  - allenai/ultrafeedback_binarized_cleaned
@@ -43,6 +42,110 @@ datasets:
43
  - WhiteRabbitNeo/WRN-Chapter-1
44
  - WhiteRabbitNeo/WRN-Chapter-2
45
  - winogrande
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
46
  ---
47
 
48
  # A bagel, with everything
@@ -810,4 +913,17 @@ https://bmc.link/jondurbin
810
 
811
  ETH 0xce914eAFC2fe52FdceE59565Dd92c06f776fcb11
812
 
813
- BTC bc1qdwuth4vlg8x37ggntlxu5cjfwgmdy5zaa7pswf
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
 
3
  datasets:
4
  - ai2_arc
5
  - allenai/ultrafeedback_binarized_cleaned
 
42
  - WhiteRabbitNeo/WRN-Chapter-1
43
  - WhiteRabbitNeo/WRN-Chapter-2
44
  - winogrande
45
+ base_model: mistralai/mistral-7b-v0.1
46
+ model-index:
47
+ - name: bagel-dpo-7b-v0.4
48
+ results:
49
+ - task:
50
+ type: text-generation
51
+ name: Text Generation
52
+ dataset:
53
+ name: AI2 Reasoning Challenge (25-Shot)
54
+ type: ai2_arc
55
+ config: ARC-Challenge
56
+ split: test
57
+ args:
58
+ num_few_shot: 25
59
+ metrics:
60
+ - type: acc_norm
61
+ value: 67.58
62
+ name: normalized accuracy
63
+ source:
64
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=jondurbin/bagel-dpo-7b-v0.4
65
+ name: Open LLM Leaderboard
66
+ - task:
67
+ type: text-generation
68
+ name: Text Generation
69
+ dataset:
70
+ name: HellaSwag (10-Shot)
71
+ type: hellaswag
72
+ split: validation
73
+ args:
74
+ num_few_shot: 10
75
+ metrics:
76
+ - type: acc_norm
77
+ value: 84.3
78
+ name: normalized accuracy
79
+ source:
80
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=jondurbin/bagel-dpo-7b-v0.4
81
+ name: Open LLM Leaderboard
82
+ - task:
83
+ type: text-generation
84
+ name: Text Generation
85
+ dataset:
86
+ name: MMLU (5-Shot)
87
+ type: cais/mmlu
88
+ config: all
89
+ split: test
90
+ args:
91
+ num_few_shot: 5
92
+ metrics:
93
+ - type: acc
94
+ value: 61.95
95
+ name: accuracy
96
+ source:
97
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=jondurbin/bagel-dpo-7b-v0.4
98
+ name: Open LLM Leaderboard
99
+ - task:
100
+ type: text-generation
101
+ name: Text Generation
102
+ dataset:
103
+ name: TruthfulQA (0-shot)
104
+ type: truthful_qa
105
+ config: multiple_choice
106
+ split: validation
107
+ args:
108
+ num_few_shot: 0
109
+ metrics:
110
+ - type: mc2
111
+ value: 63.94
112
+ source:
113
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=jondurbin/bagel-dpo-7b-v0.4
114
+ name: Open LLM Leaderboard
115
+ - task:
116
+ type: text-generation
117
+ name: Text Generation
118
+ dataset:
119
+ name: Winogrande (5-shot)
120
+ type: winogrande
121
+ config: winogrande_xl
122
+ split: validation
123
+ args:
124
+ num_few_shot: 5
125
+ metrics:
126
+ - type: acc
127
+ value: 78.14
128
+ name: accuracy
129
+ source:
130
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=jondurbin/bagel-dpo-7b-v0.4
131
+ name: Open LLM Leaderboard
132
+ - task:
133
+ type: text-generation
134
+ name: Text Generation
135
+ dataset:
136
+ name: GSM8k (5-shot)
137
+ type: gsm8k
138
+ config: main
139
+ split: test
140
+ args:
141
+ num_few_shot: 5
142
+ metrics:
143
+ - type: acc
144
+ value: 46.85
145
+ name: accuracy
146
+ source:
147
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=jondurbin/bagel-dpo-7b-v0.4
148
+ name: Open LLM Leaderboard
149
  ---
150
 
151
  # A bagel, with everything
 
913
 
914
  ETH 0xce914eAFC2fe52FdceE59565Dd92c06f776fcb11
915
 
916
+ BTC bc1qdwuth4vlg8x37ggntlxu5cjfwgmdy5zaa7pswf
917
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
918
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_jondurbin__bagel-dpo-7b-v0.4)
919
+
920
+ | Metric |Value|
921
+ |---------------------------------|----:|
922
+ |Avg. |67.13|
923
+ |AI2 Reasoning Challenge (25-Shot)|67.58|
924
+ |HellaSwag (10-Shot) |84.30|
925
+ |MMLU (5-Shot) |61.95|
926
+ |TruthfulQA (0-shot) |63.94|
927
+ |Winogrande (5-shot) |78.14|
928
+ |GSM8k (5-shot) |46.85|
929
+