152334H commited on
Commit
daa926d
1 Parent(s): 8a13b5a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +91 -2
README.md CHANGED
@@ -39,6 +39,95 @@ Indeed, the cute catgirl is a paradox wrapped in ruffles and ribbons, a living e
39
  So let us raise our teacups in honor of this fabulous feline, this queen of camp who reminds us that life is too short for dull clothing and boring hairstyles. May we all strive to embody her spirit, embracing the absurdity of existence with open arms and a generous helping of glitter. Long live the cute catgirl! [end of text]
40
  ```
41
 
42
- exl2 3.0bpw coming soon
43
 
44
- ![](https://thicc-af.mywaifulist.moe/waifus/miku-nakano-the-quintessential-quintuplets/phUEiEhPOL75GTDLncGy2dUbkDVMfYExZ2A1RBeQ.png?class=thumbnail)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
39
  So let us raise our teacups in honor of this fabulous feline, this queen of camp who reminds us that life is too short for dull clothing and boring hairstyles. May we all strive to embody her spirit, embracing the absurdity of existence with open arms and a generous helping of glitter. Long live the cute catgirl! [end of text]
40
  ```
41
 
42
+ ![](https://thicc-af.mywaifulist.moe/waifus/miku-nakano-the-quintessential-quintuplets/phUEiEhPOL75GTDLncGy2dUbkDVMfYExZ2A1RBeQ.png?class=thumbnail)
43
 
44
+ some benchmarks
45
+
46
+ ```
47
+ | Tasks |Version|Filter|n-shot| Metric |Value | |Stderr|
48
+ |--------------|------:|------|-----:|----------|-----:|---|-----:|
49
+ |lambada_openai| 1|none | 0|perplexity|2.6354|± |0.0451|
50
+ | | |none | 0|acc |0.7879|± |0.0057|
51
+
52
+
53
+ | Tasks |Version|Filter|n-shot| Metric |Value | |Stderr|
54
+ |---------|------:|------|-----:|--------|-----:|---|-----:|
55
+ |hellaswag| 1|none | 0|acc |0.6851|± |0.0046|
56
+ | | |none | 0|acc_norm|0.8690|± |0.0034|
57
+
58
+ | Tasks |Version|Filter|n-shot|Metric|Value | |Stderr|
59
+ |----------|------:|------|-----:|------|-----:|---|-----:|
60
+ |winogrande| 1|none | 0|acc |0.7987|± |0.0113|
61
+
62
+ |Tasks|Version| Filter |n-shot| Metric |Value | |Stderr|
63
+ |-----|------:|----------|-----:|-----------|-----:|---|-----:|
64
+ |gsm8k| 2|get-answer| 5|exact_match|0.7043|± |0.0126|
65
+
66
+ | Tasks |Version|Filter|n-shot|Metric|Value | |Stderr|
67
+ |---------------------------------------|-------|------|-----:|------|-----:|---|-----:|
68
+ |mmlu |N/A |none | 0|acc |0.7401|± |0.1192|
69
+ | - humanities |N/A |none | 0|acc |0.7018|± |0.1281|
70
+ | - formal_logic | 0|none | 0|acc |0.4841|± |0.0447|
71
+ | - high_school_european_history | 0|none | 0|acc |0.8303|± |0.0293|
72
+ | - high_school_us_history | 0|none | 0|acc |0.9020|± |0.0209|
73
+ | - high_school_world_history | 0|none | 0|acc |0.9198|± |0.0177|
74
+ | - international_law | 0|none | 0|acc |0.8678|± |0.0309|
75
+ | - jurisprudence | 0|none | 0|acc |0.8519|± |0.0343|
76
+ | - logical_fallacies | 0|none | 0|acc |0.8344|± |0.0292|
77
+ | - moral_disputes | 0|none | 0|acc |0.8121|± |0.0210|
78
+ | - moral_scenarios | 0|none | 0|acc |0.5642|± |0.0166|
79
+ | - philosophy | 0|none | 0|acc |0.8167|± |0.0220|
80
+ | - prehistory | 0|none | 0|acc |0.8611|± |0.0192|
81
+ | - professional_law | 0|none | 0|acc |0.5854|± |0.0126|
82
+ | - world_religions | 0|none | 0|acc |0.8889|± |0.0241|
83
+ | - other |N/A |none | 0|acc |0.7889|± |0.0922|
84
+ | - business_ethics | 0|none | 0|acc |0.7900|± |0.0409|
85
+ | - clinical_knowledge | 0|none | 0|acc |0.8113|± |0.0241|
86
+ | - college_medicine | 0|none | 0|acc |0.7514|± |0.0330|
87
+ | - global_facts | 0|none | 0|acc |0.5500|± |0.0500|
88
+ | - human_aging | 0|none | 0|acc |0.7848|± |0.0276|
89
+ | - management | 0|none | 0|acc |0.8835|± |0.0318|
90
+ | - marketing | 0|none | 0|acc |0.9145|± |0.0183|
91
+ | - medical_genetics | 0|none | 0|acc |0.7500|± |0.0435|
92
+ | - miscellaneous | 0|none | 0|acc |0.8838|± |0.0115|
93
+ | - nutrition | 0|none | 0|acc |0.7974|± |0.0230|
94
+ | - professional_accounting | 0|none | 0|acc |0.5922|± |0.0293|
95
+ | - professional_medicine | 0|none | 0|acc |0.8272|± |0.0230|
96
+ | - virology | 0|none | 0|acc |0.5361|± |0.0388|
97
+ | - social_sciences |N/A |none | 0|acc |0.8414|± |0.0514|
98
+ | - econometrics | 0|none | 0|acc |0.6491|± |0.0449|
99
+ | - high_school_geography | 0|none | 0|acc |0.8990|± |0.0215|
100
+ | - high_school_government_and_politics| 0|none | 0|acc |0.9430|± |0.0167|
101
+ | - high_school_macroeconomics | 0|none | 0|acc |0.7795|± |0.0210|
102
+ | - high_school_microeconomics | 0|none | 0|acc |0.8277|± |0.0245|
103
+ | - high_school_psychology | 0|none | 0|acc |0.9064|± |0.0125|
104
+ | - human_sexuality | 0|none | 0|acc |0.8626|± |0.0302|
105
+ | - professional_psychology | 0|none | 0|acc |0.8056|± |0.0160|
106
+ | - public_relations | 0|none | 0|acc |0.7636|± |0.0407|
107
+ | - security_studies | 0|none | 0|acc |0.8204|± |0.0246|
108
+ | - sociology | 0|none | 0|acc |0.8856|± |0.0225|
109
+ | - us_foreign_policy | 0|none | 0|acc |0.9100|± |0.0288|
110
+ | - stem |N/A |none | 0|acc |0.6505|± |0.1266|
111
+ | - abstract_algebra | 0|none | 0|acc |0.4100|± |0.0494|
112
+ | - anatomy | 0|none | 0|acc |0.6444|± |0.0414|
113
+ | - astronomy | 0|none | 0|acc |0.8224|± |0.0311|
114
+ | - college_biology | 0|none | 0|acc |0.8681|± |0.0283|
115
+ | - college_chemistry | 0|none | 0|acc |0.5500|± |0.0500|
116
+ | - college_computer_science | 0|none | 0|acc |0.6200|± |0.0488|
117
+ | - college_mathematics | 0|none | 0|acc |0.4200|± |0.0496|
118
+ | - college_physics | 0|none | 0|acc |0.5392|± |0.0496|
119
+ | - computer_security | 0|none | 0|acc |0.8300|± |0.0378|
120
+ | - conceptual_physics | 0|none | 0|acc |0.7362|± |0.0288|
121
+ | - electrical_engineering | 0|none | 0|acc |0.7034|± |0.0381|
122
+ | - elementary_mathematics | 0|none | 0|acc |0.5503|± |0.0256|
123
+ | - high_school_biology | 0|none | 0|acc |0.8742|± |0.0189|
124
+ | - high_school_chemistry | 0|none | 0|acc |0.6256|± |0.0341|
125
+ | - high_school_computer_science | 0|none | 0|acc |0.8400|± |0.0368|
126
+ | - high_school_mathematics | 0|none | 0|acc |0.4370|± |0.0302|
127
+ | - high_school_physics | 0|none | 0|acc |0.5033|± |0.0408|
128
+ | - high_school_statistics | 0|none | 0|acc |0.6944|± |0.0314|
129
+ | - machine_learning | 0|none | 0|acc |0.5982|± |0.0465|
130
+ ```
131
+ no i do not know why the stderr is high. plausibly it is due to the vllm backend used. this is my lm-eval command in most cases (works on h100):
132
+
133
+ `lm_eval --model vllm --model_args pretrained=./miqu-1-70b-sf,tensor_parallel_size=4,dtype=auto,gpu_memory_utilization=0.88,data_parallel_size=2 --tasks mmlu --batch_size 20`