blair-johnson
commited on
Commit
•
75cead6
1
Parent(s):
e9839ea
Update README.md
Browse files
README.md
CHANGED
@@ -111,11 +111,13 @@ Common benchmark scores generated using the [Eleuther AI LLM Evaluation Harness]
|
|
111 |
|
112 |
| Task | Version | Metric | Value | Stderr |
|
113 |
|------|---------|--------|-------|--------|
|
|
|
114 |
| arc_challenge 25-shot | 0 | acc | 0.4684 | 0.146 |
|
115 |
| | | acc_norm | 0.4787 | 0.146 |
|
116 |
|hellaswag 10-shot| 0 | acc | 0.4705 | 0.0050 |
|
117 |
| | | acc_norm | 0.6111 | 0.0049 |
|
118 |
|
|
|
119 |
Qualitative evaluation suggests that the evol-instruct-70k fine-tuned Galactica models are signficantly more controllable and attentive to user prompts than the Alpaca fine-tuned GALPACA models.
|
120 |
|
121 |
## Works Cited
|
|
|
111 |
|
112 |
| Task | Version | Metric | Value | Stderr |
|
113 |
|------|---------|--------|-------|--------|
|
114 |
+
| MMLU 5-shot | 1 | acc | 0.4420 | |
|
115 |
| arc_challenge 25-shot | 0 | acc | 0.4684 | 0.146 |
|
116 |
| | | acc_norm | 0.4787 | 0.146 |
|
117 |
|hellaswag 10-shot| 0 | acc | 0.4705 | 0.0050 |
|
118 |
| | | acc_norm | 0.6111 | 0.0049 |
|
119 |
|
120 |
+
|
121 |
Qualitative evaluation suggests that the evol-instruct-70k fine-tuned Galactica models are signficantly more controllable and attentive to user prompts than the Alpaca fine-tuned GALPACA models.
|
122 |
|
123 |
## Works Cited
|