Text Generation
Transformers
PyTorch
English
llama
sft
Inference Endpoints
text-generation-inference

Adding Evaluation Results

#6
Files changed (1) hide show
  1. README.md +14 -1
README.md CHANGED
@@ -323,4 +323,17 @@ Stage 2 dataset statistics:
323
  --rope_scaling_factor 1.0
324
  --finetune
325
  --wandb_logger
326
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
323
  --rope_scaling_factor 1.0
324
  --finetune
325
  --wandb_logger
326
+ ```
327
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
328
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_OpenAssistant__llama2-70b-oasst-sft-v10)
329
+
330
+ | Metric | Value |
331
+ |-----------------------|---------------------------|
332
+ | Avg. | 57.58 |
333
+ | ARC (25-shot) | 67.06 |
334
+ | HellaSwag (10-shot) | 86.38 |
335
+ | MMLU (5-shot) | 67.7 |
336
+ | TruthfulQA (0-shot) | 56.45 |
337
+ | Winogrande (5-shot) | 82.0 |
338
+ | GSM8K (5-shot) | 27.22 |
339
+ | DROP (3-shot) | 16.28 |