llama-2-34b-uncode / README.md
chargoddard's picture
Adding Evaluation Results (#2)
ce33e5d
metadata
license: llama2
datasets:
  - the_pile_books3
  - togethercomputer/RedPajama-Data-1T-Sample
language:
  - en

very wip experiment.

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 36.2
ARC (25-shot) 39.51
HellaSwag (10-shot) 33.9
MMLU (5-shot) 38.49
TruthfulQA (0-shot) 40.94
Winogrande (5-shot) 74.35
GSM8K (5-shot) 20.77
DROP (3-shot) 5.43