Text Generation
Transformers
PyTorch
English
llama
Eval Results
text-generation-inference
Inference Endpoints
Pankaj Mathur commited on
Commit
8aded8e
1 Parent(s): 0c3d4df

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -7
README.md CHANGED
@@ -26,14 +26,10 @@ Here are the zero shot metrics results.
26
  |:------:|:-------------:|:---------:|:--------:|:-------:|:--------:|
27
  |**Task**|**num_fewshot**|**Version**|**Metric**|**Value**|**Stderr**|
28
  |*arc_easy*|0|0|acc|0.7386|0.0090|
29
- |*arc_easy*|0|0|acc_norm|0.7066|0.0093|
30
- |*hellaswag*|0|0|acc|0.5591|0.0050|
31
  |*hellaswag*|0|0|acc_norm|0.7394|0.0044|
32
- |*truthfulqa_mc*|0|1|mc1|0.2938|0.0159|
33
  |*truthfulqa_mc*|0|1|mc2|0.4399|0.0153|
34
- |*mmlu avg*|0|1|acc|0.4108|0.0153|
35
- |*mmlu avg*|0|1|acc_norm|0.4108|0.0153|
36
- |*Total Zero Shot Average*|0|-|-|0.5373|0.011|
37
 
38
 
39
  Here are the results on metrics used by [HuggingFaceH4 Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
@@ -43,8 +39,12 @@ please note num_fewshots varies for each below task as used by HuggingFaceH4 Ope
43
  |||||||
44
  |:------:|:-------------:|:---------:|:--------:|:-------:|:--------:|
45
  |**Task**|**num_fewshot**|**Version**|**Metric**|**Value**|**Stderr**|
46
- |*arc_challenge*|25|0|acc|0.4846|0.0146|
47
  |*arc_challenge*|25|0|acc_norm|0.5077|0.0146|
 
 
 
 
 
48
 
49
 
50
 
 
26
  |:------:|:-------------:|:---------:|:--------:|:-------:|:--------:|
27
  |**Task**|**num_fewshot**|**Version**|**Metric**|**Value**|**Stderr**|
28
  |*arc_easy*|0|0|acc|0.7386|0.0090|
 
 
29
  |*hellaswag*|0|0|acc_norm|0.7394|0.0044|
 
30
  |*truthfulqa_mc*|0|1|mc2|0.4399|0.0153|
31
+ |*mmlu*|0|1|acc_norm|0.4108|0.0153|
32
+ |*Total Zero Shot Average*|0|-|-|0.5821|0.011|
 
33
 
34
 
35
  Here are the results on metrics used by [HuggingFaceH4 Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
 
39
  |||||||
40
  |:------:|:-------------:|:---------:|:--------:|:-------:|:--------:|
41
  |**Task**|**num_fewshot**|**Version**|**Metric**|**Value**|**Stderr**|
 
42
  |*arc_challenge*|25|0|acc_norm|0.5077|0.0146|
43
+ |*hellaswag*|10|0|acc_norm|0.7617|0.0043|
44
+ |*mmlu*|5|0|acc_norm|-|-|
45
+ |*truthfulqa_mc*|0|1|mc2|0.4399|0.0153|
46
+ |*Total Average*|0|-|-|0.5697|0.0114|
47
+
48
 
49
 
50