Update README.md
#1
by
lilloukas
- opened
README.md
CHANGED
@@ -17,11 +17,11 @@ SuperPlatty-30B is a merge of [lilloukas/Platypus-30B](https://huggingface.co/li
|
|
17 |
|
18 |
| Metric | Value |
|
19 |
|-----------------------|-------|
|
20 |
-
| MMLU (5-shot) |
|
21 |
-
| ARC (25-shot) |
|
22 |
-
| HellaSwag (10-shot) |
|
23 |
-
| TruthfulQA (0-shot) |
|
24 |
-
| Avg. |
|
25 |
|
26 |
We use state-of-the-art EleutherAI [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) to run the benchmark tests above.
|
27 |
|
@@ -51,22 +51,22 @@ Each task was evaluated on a single A100 80GB GPU.
|
|
51 |
|
52 |
ARC:
|
53 |
```
|
54 |
-
python main.py --model hf-causal-experimental --model_args pretrained=
|
55 |
```
|
56 |
|
57 |
HellaSwag:
|
58 |
```
|
59 |
-
python main.py --model hf-causal-experimental --model_args pretrained=
|
60 |
```
|
61 |
|
62 |
MMLU:
|
63 |
```
|
64 |
-
python main.py --model hf-causal-experimental --model_args pretrained=
|
65 |
```
|
66 |
|
67 |
TruthfulQA:
|
68 |
```
|
69 |
-
python main.py --model hf-causal-experimental --model_args pretrained=
|
70 |
```
|
71 |
## Limitations and bias
|
72 |
|
|
|
17 |
|
18 |
| Metric | Value |
|
19 |
|-----------------------|-------|
|
20 |
+
| MMLU (5-shot) | 62.6 |
|
21 |
+
| ARC (25-shot) | 66.1 |
|
22 |
+
| HellaSwag (10-shot) | 83.9 |
|
23 |
+
| TruthfulQA (0-shot) | 54.0 |
|
24 |
+
| Avg. | 66.6 |
|
25 |
|
26 |
We use state-of-the-art EleutherAI [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) to run the benchmark tests above.
|
27 |
|
|
|
51 |
|
52 |
ARC:
|
53 |
```
|
54 |
+
python main.py --model hf-causal-experimental --model_args pretrained=ariellee/SuperPlatty-30B --tasks arc_challenge --batch_size 1 --no_cache --write_out --output_path results/SuperPlatty-30B/arc_challenge_25shot.json --device cuda --num_fewshot 25
|
55 |
```
|
56 |
|
57 |
HellaSwag:
|
58 |
```
|
59 |
+
python main.py --model hf-causal-experimental --model_args pretrained=ariellee/SuperPlatty-30B --tasks hellaswag --batch_size 1 --no_cache --write_out --output_path results/SuperPlatty-30B/hellaswag_10shot.json --device cuda --num_fewshot 10
|
60 |
```
|
61 |
|
62 |
MMLU:
|
63 |
```
|
64 |
+
python main.py --model hf-causal-experimental --model_args pretrained=ariellee/SuperPlatty-30B --tasks hendrycksTest-* --batch_size 1 --no_cache --write_out --output_path results/SuperPlatty-30B/mmlu_5shot.json --device cuda --num_fewshot 5
|
65 |
```
|
66 |
|
67 |
TruthfulQA:
|
68 |
```
|
69 |
+
python main.py --model hf-causal-experimental --model_args pretrained=ariellee/SuperPlatty-30B --tasks truthfulqa_mc --batch_size 1 --no_cache --write_out --output_path results/SuperPlatty-30B/truthfulqa_0shot.json --device cuda
|
70 |
```
|
71 |
## Limitations and bias
|
72 |
|