blair-johnson commited on
Commit
75cead6
1 Parent(s): e9839ea

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -0
README.md CHANGED
@@ -111,11 +111,13 @@ Common benchmark scores generated using the [Eleuther AI LLM Evaluation Harness]
111
 
112
  | Task | Version | Metric | Value | Stderr |
113
  |------|---------|--------|-------|--------|
 
114
  | arc_challenge 25-shot | 0 | acc | 0.4684 | 0.146 |
115
  | | | acc_norm | 0.4787 | 0.146 |
116
  |hellaswag 10-shot| 0 | acc | 0.4705 | 0.0050 |
117
  | | | acc_norm | 0.6111 | 0.0049 |
118
 
 
119
  Qualitative evaluation suggests that the evol-instruct-70k fine-tuned Galactica models are signficantly more controllable and attentive to user prompts than the Alpaca fine-tuned GALPACA models.
120
 
121
  ## Works Cited
 
111
 
112
  | Task | Version | Metric | Value | Stderr |
113
  |------|---------|--------|-------|--------|
114
+ | MMLU 5-shot | 1 | acc | 0.4420 | |
115
  | arc_challenge 25-shot | 0 | acc | 0.4684 | 0.146 |
116
  | | | acc_norm | 0.4787 | 0.146 |
117
  |hellaswag 10-shot| 0 | acc | 0.4705 | 0.0050 |
118
  | | | acc_norm | 0.6111 | 0.0049 |
119
 
120
+
121
  Qualitative evaluation suggests that the evol-instruct-70k fine-tuned Galactica models are signficantly more controllable and attentive to user prompts than the Alpaca fine-tuned GALPACA models.
122
 
123
  ## Works Cited