Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
Running a long time
#653
by
dnhkng
- opened
Could someone check this run:
https://huggingface.co/datasets/open-llm-leaderboard/requests/blob/main/Infinimol/miiqu-f16_eval_request_False_float16_Original.json
The last time it failed, and I think it might have failed again during the model download.
Hi!
Your model actually finished, I put your scores below.
They should be pushed today to the hub (it's a separate step in our backend).
Feel free to reopen if they are not pushed tomorrow.
]
| Task |Version| Metric |Value | |Stderr|
|-----------------------------------------------------------|------:|--------|-----:|---|-----:|
|harness:arc:challenge:25 | 0|acc |0.6860|± |0.0136|
| | |acc_norm|0.7287|± |0.0130|
|harness:hellaswag:10 | 0|acc |0.7149|± |0.0045|
| | |acc_norm|0.8897|± |0.0031|
|harness:hendrycksTest-abstract_algebra:5 | 1|acc |0.3900|± |0.0490|
| | |acc_norm|0.3900|± |0.0490|
|harness:hendrycksTest-anatomy:5 | 1|acc |0.6667|± |0.0407|
| | |acc_norm|0.6667|± |0.0407|
|harness:hendrycksTest-astronomy:5 | 1|acc |0.8618|± |0.0281|
| | |acc_norm|0.8618|± |0.0281|
|harness:hendrycksTest-business_ethics:5 | 1|acc |0.8000|± |0.0402|
| | |acc_norm|0.8000|± |0.0402|
|harness:hendrycksTest-clinical_knowledge:5 | 1|acc |0.7849|± |0.0253|
| | |acc_norm|0.7849|± |0.0253|
|harness:hendrycksTest-college_biology:5 | 1|acc |0.9097|± |0.0240|
| | |acc_norm|0.9097|± |0.0240|
|harness:hendrycksTest-college_chemistry:5 | 1|acc |0.5400|± |0.0501|
| | |acc_norm|0.5400|± |0.0501|
|harness:hendrycksTest-college_computer_science:5 | 1|acc |0.6800|± |0.0469|
| | |acc_norm|0.6800|± |0.0469|
|harness:hendrycksTest-college_mathematics:5 | 1|acc |0.4900|± |0.0502|
| | |acc_norm|0.4900|± |0.0502|
|harness:hendrycksTest-college_medicine:5 | 1|acc |0.7399|± |0.0335|
| | |acc_norm|0.7399|± |0.0335|
|harness:hendrycksTest-college_physics:5 | 1|acc |0.4804|± |0.0497|
| | |acc_norm|0.4804|± |0.0497|
|harness:hendrycksTest-computer_security:5 | 1|acc |0.7900|± |0.0409|
| | |acc_norm|0.7900|± |0.0409|
|harness:hendrycksTest-conceptual_physics:5 | 1|acc |0.7660|± |0.0277|
| | |acc_norm|0.7660|± |0.0277|
|harness:hendrycksTest-econometrics:5 | 1|acc |0.6140|± |0.0458|
| | |acc_norm|0.6140|± |0.0458|
|harness:hendrycksTest-electrical_engineering:5 | 1|acc |0.7172|± |0.0375|
| | |acc_norm|0.7172|± |0.0375|
|harness:hendrycksTest-elementary_mathematics:5 | 1|acc |0.5476|± |0.0256|
| | |acc_norm|0.5476|± |0.0256|
|harness:hendrycksTest-formal_logic:5 | 1|acc |0.5714|± |0.0443|
| | |acc_norm|0.5714|± |0.0443|
|harness:hendrycksTest-global_facts:5 | 1|acc |0.5400|± |0.0501|
| | |acc_norm|0.5400|± |0.0501|
|harness:hendrycksTest-high_school_biology:5 | 1|acc |0.8774|± |0.0187|
| | |acc_norm|0.8774|± |0.0187|
|harness:hendrycksTest-high_school_chemistry:5 | 1|acc |0.6552|± |0.0334|
| | |acc_norm|0.6552|± |0.0334|
|harness:hendrycksTest-high_school_computer_science:5 | 1|acc |0.8400|± |0.0368|
| | |acc_norm|0.8400|± |0.0368|
|harness:hendrycksTest-high_school_european_history:5 | 1|acc |0.8545|± |0.0275|
| | |acc_norm|0.8545|± |0.0275|
|harness:hendrycksTest-high_school_geography:5 | 1|acc |0.8990|± |0.0215|
| | |acc_norm|0.8990|± |0.0215|
|harness:hendrycksTest-high_school_government_and_politics:5| 1|acc |0.9637|± |0.0135|
| | |acc_norm|0.9637|± |0.0135|
|harness:hendrycksTest-high_school_macroeconomics:5 | 1|acc |0.7718|± |0.0213|
| | |acc_norm|0.7718|± |0.0213|
|harness:hendrycksTest-high_school_mathematics:5 | 1|acc |0.4259|± |0.0301|
| | |acc_norm|0.4259|± |0.0301|
|harness:hendrycksTest-high_school_microeconomics:5 | 1|acc |0.8403|± |0.0238|
| | |acc_norm|0.8403|± |0.0238|
|harness:hendrycksTest-high_school_physics:5 | 1|acc |0.5695|± |0.0404|
| | |acc_norm|0.5695|± |0.0404|
|harness:hendrycksTest-high_school_psychology:5 | 1|acc |0.9193|± |0.0117|
| | |acc_norm|0.9193|± |0.0117|
|harness:hendrycksTest-high_school_statistics:5 | 1|acc |0.6852|± |0.0317|
| | |acc_norm|0.6852|± |0.0317|
|harness:hendrycksTest-high_school_us_history:5 | 1|acc |0.9020|± |0.0209|
| | |acc_norm|0.9020|± |0.0209|
|harness:hendrycksTest-high_school_world_history:5 | 1|acc |0.9030|± |0.0193|
| | |acc_norm|0.9030|± |0.0193|
|harness:hendrycksTest-human_aging:5 | 1|acc |0.8027|± |0.0267|
| | |acc_norm|0.8027|± |0.0267|
|harness:hendrycksTest-human_sexuality:5 | 1|acc |0.8550|± |0.0309|
| | |acc_norm|0.8550|± |0.0309|
|harness:hendrycksTest-international_law:5 | 1|acc |0.9256|± |0.0240|
| | |acc_norm|0.9256|± |0.0240|
|harness:hendrycksTest-jurisprudence:5 | 1|acc |0.8704|± |0.0325|
| | |acc_norm|0.8704|± |0.0325|
|harness:hendrycksTest-logical_fallacies:5 | 1|acc |0.8466|± |0.0283|
| | |acc_norm|0.8466|± |0.0283|
|harness:hendrycksTest-machine_learning:5 | 1|acc |0.6875|± |0.0440|
| | |acc_norm|0.6875|± |0.0440|
|harness:hendrycksTest-management:5 | 1|acc |0.8835|± |0.0318|
| | |acc_norm|0.8835|± |0.0318|
|harness:hendrycksTest-marketing:5 | 1|acc |0.9316|± |0.0165|
| | |acc_norm|0.9316|± |0.0165|
|harness:hendrycksTest-medical_genetics:5 | 1|acc |0.8000|± |0.0402|
| | |acc_norm|0.8000|± |0.0402|
|harness:hendrycksTest-miscellaneous:5 | 1|acc |0.8902|± |0.0112|
| | |acc_norm|0.8902|± |0.0112|
|harness:hendrycksTest-moral_disputes:5 | 1|acc |0.8497|± |0.0192|
| | |acc_norm|0.8497|± |0.0192|
|harness:hendrycksTest-moral_scenarios:5 | 1|acc |0.7698|± |0.0141|
| | |acc_norm|0.7698|± |0.0141|
|harness:hendrycksTest-nutrition:5 | 1|acc |0.8464|± |0.0206|
| | |acc_norm|0.8464|± |0.0206|
|harness:hendrycksTest-philosophy:5 | 1|acc |0.8264|± |0.0215|
| | |acc_norm|0.8264|± |0.0215|
|harness:hendrycksTest-prehistory:5 | 1|acc |0.8580|± |0.0194|
| | |acc_norm|0.8580|± |0.0194|
|harness:hendrycksTest-professional_accounting:5 | 1|acc |0.5887|± |0.0294|
| | |acc_norm|0.5887|± |0.0294|
|harness:hendrycksTest-professional_law:5 | 1|acc |0.6069|± |0.0125|
| | |acc_norm|0.6069|± |0.0125|
|harness:hendrycksTest-professional_medicine:5 | 1|acc |0.8162|± |0.0235|
| | |acc_norm|0.8162|± |0.0235|
|harness:hendrycksTest-professional_psychology:5 | 1|acc |0.8317|± |0.0151|
| | |acc_norm|0.8317|± |0.0151|
|harness:hendrycksTest-public_relations:5 | 1|acc |0.7545|± |0.0412|
| | |acc_norm|0.7545|± |0.0412|
|harness:hendrycksTest-security_studies:5 | 1|acc |0.8286|± |0.0241|
| | |acc_norm|0.8286|± |0.0241|
|harness:hendrycksTest-sociology:5 | 1|acc |0.9154|± |0.0197|
| | |acc_norm|0.9154|± |0.0197|
|harness:hendrycksTest-us_foreign_policy:5 | 1|acc |0.9100|± |0.0288|
| | |acc_norm|0.9100|± |0.0288|
|harness:hendrycksTest-virology:5 | 1|acc |0.5361|± |0.0388|
| | |acc_norm|0.5361|± |0.0388|
|harness:hendrycksTest-world_religions:5 | 1|acc |0.8889|± |0.0241|
| | |acc_norm|0.8889|± |0.0241|
|harness:truthfulqa:mc:0 | 1|mc1 |0.5300|± |0.0175|
| | |mc2 |0.6937|± |0.0149|
|harness:winogrande:5 | 0|acc |0.8556|± |0.0099|
|harness:gsm8k:5 | 0|acc |0.6785|± |0.0129|
|all | 0|acc |0.7582|± |0.0285|
| | |acc_norm|0.7616|± |0.0291|
| | |mc1 |0.5300|± |0.0175|
| | |mc2 |0.6937|± |0.0149|
clefourrier
changed discussion status to
closed
Just reopening as requested, as I don't see the results on the leaderboard
clefourrier
changed discussion status to
open
Found it, I had to turn of a search setting.
dnhkng
changed discussion status to
closed