theprint leaderboard-pr-bot commited on
Commit
797b7dd
1 Parent(s): ef7c757

Adding Evaluation Results (#1)

Browse files

- Adding Evaluation Results (ba473a22cf71bf125937b489cc29375968870ebe)


Co-authored-by: Open LLM Leaderboard PR Bot <leaderboard-pr-bot@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +111 -4
README.md CHANGED
@@ -9,11 +9,105 @@ tags:
9
  - h2o-llmstudio
10
  - theprint
11
  - boptruth
12
- inference: false
13
- thumbnail: >-
14
- https://h2o.ai/etc.clientlibs/h2o/clientlibs/clientlib-site/resources/images/favicon.ico
15
  datasets:
16
  - theprint/MysteryWriter
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
  ---
18
  # Model Card
19
  ## Summary
@@ -189,4 +283,17 @@ Please read this disclaimer carefully before using the large language model prov
189
  - Reporting Issues: If you encounter any biased, offensive, or otherwise inappropriate content generated by the large language model, please report it to the repository maintainers through the provided channels. Your feedback will help improve the model and mitigate potential issues.
190
  - Changes to this Disclaimer: The developers of this repository reserve the right to modify or update this disclaimer at any time without prior notice. It is the user's responsibility to periodically review the disclaimer to stay informed about any changes.
191
 
192
- By using the large language model provided in this repository, you agree to accept and comply with the terms and conditions outlined in this disclaimer. If you do not agree with any part of this disclaimer, you should refrain from using the model and any content generated by it.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  - h2o-llmstudio
10
  - theprint
11
  - boptruth
 
 
 
12
  datasets:
13
  - theprint/MysteryWriter
14
+ inference: false
15
+ thumbnail: https://h2o.ai/etc.clientlibs/h2o/clientlibs/clientlib-site/resources/images/favicon.ico
16
+ model-index:
17
+ - name: Boptruth-Agatha-7B
18
+ results:
19
+ - task:
20
+ type: text-generation
21
+ name: Text Generation
22
+ dataset:
23
+ name: IFEval (0-Shot)
24
+ type: HuggingFaceH4/ifeval
25
+ args:
26
+ num_few_shot: 0
27
+ metrics:
28
+ - type: inst_level_strict_acc and prompt_level_strict_acc
29
+ value: 31.24
30
+ name: strict accuracy
31
+ source:
32
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=theprint/Boptruth-Agatha-7B
33
+ name: Open LLM Leaderboard
34
+ - task:
35
+ type: text-generation
36
+ name: Text Generation
37
+ dataset:
38
+ name: BBH (3-Shot)
39
+ type: BBH
40
+ args:
41
+ num_few_shot: 3
42
+ metrics:
43
+ - type: acc_norm
44
+ value: 29.29
45
+ name: normalized accuracy
46
+ source:
47
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=theprint/Boptruth-Agatha-7B
48
+ name: Open LLM Leaderboard
49
+ - task:
50
+ type: text-generation
51
+ name: Text Generation
52
+ dataset:
53
+ name: MATH Lvl 5 (4-Shot)
54
+ type: hendrycks/competition_math
55
+ args:
56
+ num_few_shot: 4
57
+ metrics:
58
+ - type: exact_match
59
+ value: 4.61
60
+ name: exact match
61
+ source:
62
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=theprint/Boptruth-Agatha-7B
63
+ name: Open LLM Leaderboard
64
+ - task:
65
+ type: text-generation
66
+ name: Text Generation
67
+ dataset:
68
+ name: GPQA (0-shot)
69
+ type: Idavidrein/gpqa
70
+ args:
71
+ num_few_shot: 0
72
+ metrics:
73
+ - type: acc_norm
74
+ value: 6.6
75
+ name: acc_norm
76
+ source:
77
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=theprint/Boptruth-Agatha-7B
78
+ name: Open LLM Leaderboard
79
+ - task:
80
+ type: text-generation
81
+ name: Text Generation
82
+ dataset:
83
+ name: MuSR (0-shot)
84
+ type: TAUR-Lab/MuSR
85
+ args:
86
+ num_few_shot: 0
87
+ metrics:
88
+ - type: acc_norm
89
+ value: 11.76
90
+ name: acc_norm
91
+ source:
92
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=theprint/Boptruth-Agatha-7B
93
+ name: Open LLM Leaderboard
94
+ - task:
95
+ type: text-generation
96
+ name: Text Generation
97
+ dataset:
98
+ name: MMLU-PRO (5-shot)
99
+ type: TIGER-Lab/MMLU-Pro
100
+ config: main
101
+ split: test
102
+ args:
103
+ num_few_shot: 5
104
+ metrics:
105
+ - type: acc
106
+ value: 20.67
107
+ name: accuracy
108
+ source:
109
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=theprint/Boptruth-Agatha-7B
110
+ name: Open LLM Leaderboard
111
  ---
112
  # Model Card
113
  ## Summary
 
283
  - Reporting Issues: If you encounter any biased, offensive, or otherwise inappropriate content generated by the large language model, please report it to the repository maintainers through the provided channels. Your feedback will help improve the model and mitigate potential issues.
284
  - Changes to this Disclaimer: The developers of this repository reserve the right to modify or update this disclaimer at any time without prior notice. It is the user's responsibility to periodically review the disclaimer to stay informed about any changes.
285
 
286
+ By using the large language model provided in this repository, you agree to accept and comply with the terms and conditions outlined in this disclaimer. If you do not agree with any part of this disclaimer, you should refrain from using the model and any content generated by it.
287
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
288
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_theprint__Boptruth-Agatha-7B)
289
+
290
+ | Metric |Value|
291
+ |-------------------|----:|
292
+ |Avg. |17.36|
293
+ |IFEval (0-Shot) |31.24|
294
+ |BBH (3-Shot) |29.29|
295
+ |MATH Lvl 5 (4-Shot)| 4.61|
296
+ |GPQA (0-shot) | 6.60|
297
+ |MuSR (0-shot) |11.76|
298
+ |MMLU-PRO (5-shot) |20.67|
299
+