metadata

license: apache-2.0
datasets:
  - itsliupeng/mmlu_recall
language:
  - en
pipeline_tag: text-generation

We are utilizing the mmlu_recall dataset to continuously train the Llama-2-7b-hf model, aiming to enhance performance on mmlu metrics, while ensuring that other metric performances remain unaffected.

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	46.31
ARC (25-shot)	56.14
HellaSwag (10-shot)	79.13
MMLU (5-shot)	60.04
TruthfulQA (0-shot)	40.95
Winogrande (5-shot)	74.43
GSM8K (5-shot)	7.88
DROP (3-shot)	5.59