chiliu commited on
Commit
13d7a7a
1 Parent(s): 935f4d9

add benchmark

Browse files
Files changed (2) hide show
  1. .gitattributes +1 -0
  2. README.md +16 -0
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ pytorch_model.bin filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -13,6 +13,22 @@ license: apache-2.0
13
  ---
14
  # Model Card
15
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
  ## Summary
17
 
18
  We have fine-tuned the open-lama model and surpassed the original model in multiple evaluation subtasks, making it currently the best performing 3B model with comparable performance to llama-7b
 
13
  ---
14
  # Model Card
15
 
16
+ ** The Best 3B Model! Surpassing dolly-v2-12b **
17
+
18
+ The best 3B model on the [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard), with performance surpassing dolly-v2-12b
19
+
20
+ | Metric | Value |
21
+ |-----------------------|-------|
22
+ | MMLU (5-shot) | 27.1 |
23
+ | ARC (25-shot) | 42.2 |
24
+ | HellaSwag (10-shot) | 71.5 |
25
+ | TruthfulQA (0-shot) | 36.7 |
26
+ | Avg. | 44.4 |
27
+
28
+ We use state-of-the-art [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) to run the benchmark tests above.
29
+
30
+
31
+
32
  ## Summary
33
 
34
  We have fine-tuned the open-lama model and surpassed the original model in multiple evaluation subtasks, making it currently the best performing 3B model with comparable performance to llama-7b