Commit
2312c58
1 Parent(s): 7d7e745

Adding Evaluation Results (#2)

Browse files

- Adding Evaluation Results (7537045377ddd335ace535864f51770dddcb7641)


Co-authored-by: Open LLM Leaderboard PR Bot <leaderboard-pr-bot@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +129 -16
README.md CHANGED
@@ -1,26 +1,126 @@
1
  ---
2
- license: llama2
3
  language:
4
  - en
 
5
  library_name: transformers
6
- pipeline_tag: text-generation
7
- model-index:
8
- - name: CodeMate-v0.1
9
- results:
10
- - task:
11
- type: text-generation
12
- dataset:
13
- name: HumanEval
14
- type: openai_humaneval
15
- metrics:
16
- - name: pass@1
17
- type: pass@1
18
- value: 74.9%
19
- verified: false
20
  tags:
21
  - CodeMate
22
  - Code
23
  - CodeLLaMa
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24
  ---
25
 
26
 
@@ -83,4 +183,17 @@ tokenizer = AutoTokenizer.from_pretrained(model_path)
83
 
84
  This model has undergone very limited testing. CodeMate recommends additional safety testing before any real-world deployments.
85
 
86
- For more information and updates, visit the [CodeMate website](https://codemate.ai).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
2
  language:
3
  - en
4
+ license: llama2
5
  library_name: transformers
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  tags:
7
  - CodeMate
8
  - Code
9
  - CodeLLaMa
10
+ pipeline_tag: text-generation
11
+ model-index:
12
+ - name: CodeMate-v0.1
13
+ results:
14
+ - task:
15
+ type: text-generation
16
+ dataset:
17
+ name: HumanEval
18
+ type: openai_humaneval
19
+ metrics:
20
+ - type: pass@1
21
+ value: 74.9%
22
+ name: pass@1
23
+ verified: false
24
+ - task:
25
+ type: text-generation
26
+ name: Text Generation
27
+ dataset:
28
+ name: AI2 Reasoning Challenge (25-Shot)
29
+ type: ai2_arc
30
+ config: ARC-Challenge
31
+ split: test
32
+ args:
33
+ num_few_shot: 25
34
+ metrics:
35
+ - type: acc_norm
36
+ value: 55.55
37
+ name: normalized accuracy
38
+ source:
39
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=codemateai/CodeMate-v0.1
40
+ name: Open LLM Leaderboard
41
+ - task:
42
+ type: text-generation
43
+ name: Text Generation
44
+ dataset:
45
+ name: HellaSwag (10-Shot)
46
+ type: hellaswag
47
+ split: validation
48
+ args:
49
+ num_few_shot: 10
50
+ metrics:
51
+ - type: acc_norm
52
+ value: 78.03
53
+ name: normalized accuracy
54
+ source:
55
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=codemateai/CodeMate-v0.1
56
+ name: Open LLM Leaderboard
57
+ - task:
58
+ type: text-generation
59
+ name: Text Generation
60
+ dataset:
61
+ name: MMLU (5-Shot)
62
+ type: cais/mmlu
63
+ config: all
64
+ split: test
65
+ args:
66
+ num_few_shot: 5
67
+ metrics:
68
+ - type: acc
69
+ value: 55.31
70
+ name: accuracy
71
+ source:
72
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=codemateai/CodeMate-v0.1
73
+ name: Open LLM Leaderboard
74
+ - task:
75
+ type: text-generation
76
+ name: Text Generation
77
+ dataset:
78
+ name: TruthfulQA (0-shot)
79
+ type: truthful_qa
80
+ config: multiple_choice
81
+ split: validation
82
+ args:
83
+ num_few_shot: 0
84
+ metrics:
85
+ - type: mc2
86
+ value: 48.64
87
+ source:
88
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=codemateai/CodeMate-v0.1
89
+ name: Open LLM Leaderboard
90
+ - task:
91
+ type: text-generation
92
+ name: Text Generation
93
+ dataset:
94
+ name: Winogrande (5-shot)
95
+ type: winogrande
96
+ config: winogrande_xl
97
+ split: validation
98
+ args:
99
+ num_few_shot: 5
100
+ metrics:
101
+ - type: acc
102
+ value: 72.61
103
+ name: accuracy
104
+ source:
105
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=codemateai/CodeMate-v0.1
106
+ name: Open LLM Leaderboard
107
+ - task:
108
+ type: text-generation
109
+ name: Text Generation
110
+ dataset:
111
+ name: GSM8k (5-shot)
112
+ type: gsm8k
113
+ config: main
114
+ split: test
115
+ args:
116
+ num_few_shot: 5
117
+ metrics:
118
+ - type: acc
119
+ value: 40.18
120
+ name: accuracy
121
+ source:
122
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=codemateai/CodeMate-v0.1
123
+ name: Open LLM Leaderboard
124
  ---
125
 
126
 
 
183
 
184
  This model has undergone very limited testing. CodeMate recommends additional safety testing before any real-world deployments.
185
 
186
+ For more information and updates, visit the [CodeMate website](https://codemate.ai).
187
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
188
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_codemateai__CodeMate-v0.1)
189
+
190
+ | Metric |Value|
191
+ |---------------------------------|----:|
192
+ |Avg. |58.39|
193
+ |AI2 Reasoning Challenge (25-Shot)|55.55|
194
+ |HellaSwag (10-Shot) |78.03|
195
+ |MMLU (5-Shot) |55.31|
196
+ |TruthfulQA (0-shot) |48.64|
197
+ |Winogrande (5-shot) |72.61|
198
+ |GSM8k (5-shot) |40.18|
199
+