Running a long time

#653
by dnhkng - opened

Could someone check this run:
https://huggingface.co/datasets/open-llm-leaderboard/requests/blob/main/Infinimol/miiqu-f16_eval_request_False_float16_Original.json

The last time it failed, and I think it might have failed again during the model download.

Open LLM Leaderboard org

Hi!
Your model actually finished, I put your scores below.
They should be pushed today to the hub (it's a separate step in our backend).

Feel free to reopen if they are not pushed tomorrow.

]
  |                           Task                            |Version| Metric |Value |   |Stderr|
|-----------------------------------------------------------|------:|--------|-----:|---|-----:|
|harness:arc:challenge:25                                   |      0|acc     |0.6860|±  |0.0136|
|                                                           |       |acc_norm|0.7287|±  |0.0130|
|harness:hellaswag:10                                       |      0|acc     |0.7149|±  |0.0045|
|                                                           |       |acc_norm|0.8897|±  |0.0031|
|harness:hendrycksTest-abstract_algebra:5                   |      1|acc     |0.3900|±  |0.0490|
|                                                           |       |acc_norm|0.3900|±  |0.0490|
|harness:hendrycksTest-anatomy:5                            |      1|acc     |0.6667|±  |0.0407|
|                                                           |       |acc_norm|0.6667|±  |0.0407|
|harness:hendrycksTest-astronomy:5                          |      1|acc     |0.8618|±  |0.0281|
|                                                           |       |acc_norm|0.8618|±  |0.0281|
|harness:hendrycksTest-business_ethics:5                    |      1|acc     |0.8000|±  |0.0402|
|                                                           |       |acc_norm|0.8000|±  |0.0402|
|harness:hendrycksTest-clinical_knowledge:5                 |      1|acc     |0.7849|±  |0.0253|
|                                                           |       |acc_norm|0.7849|±  |0.0253|
|harness:hendrycksTest-college_biology:5                    |      1|acc     |0.9097|±  |0.0240|
|                                                           |       |acc_norm|0.9097|±  |0.0240|
|harness:hendrycksTest-college_chemistry:5                  |      1|acc     |0.5400|±  |0.0501|
|                                                           |       |acc_norm|0.5400|±  |0.0501|
|harness:hendrycksTest-college_computer_science:5           |      1|acc     |0.6800|±  |0.0469|
|                                                           |       |acc_norm|0.6800|±  |0.0469|
|harness:hendrycksTest-college_mathematics:5                |      1|acc     |0.4900|±  |0.0502|
|                                                           |       |acc_norm|0.4900|±  |0.0502|
|harness:hendrycksTest-college_medicine:5                   |      1|acc     |0.7399|±  |0.0335|
|                                                           |       |acc_norm|0.7399|±  |0.0335|
|harness:hendrycksTest-college_physics:5                    |      1|acc     |0.4804|±  |0.0497|
|                                                           |       |acc_norm|0.4804|±  |0.0497|
|harness:hendrycksTest-computer_security:5                  |      1|acc     |0.7900|±  |0.0409|
|                                                           |       |acc_norm|0.7900|±  |0.0409|
|harness:hendrycksTest-conceptual_physics:5                 |      1|acc     |0.7660|±  |0.0277|
|                                                           |       |acc_norm|0.7660|±  |0.0277|
|harness:hendrycksTest-econometrics:5                       |      1|acc     |0.6140|±  |0.0458|
|                                                           |       |acc_norm|0.6140|±  |0.0458|
|harness:hendrycksTest-electrical_engineering:5             |      1|acc     |0.7172|±  |0.0375|
|                                                           |       |acc_norm|0.7172|±  |0.0375|
|harness:hendrycksTest-elementary_mathematics:5             |      1|acc     |0.5476|±  |0.0256|
|                                                           |       |acc_norm|0.5476|±  |0.0256|
|harness:hendrycksTest-formal_logic:5                       |      1|acc     |0.5714|±  |0.0443|
|                                                           |       |acc_norm|0.5714|±  |0.0443|
|harness:hendrycksTest-global_facts:5                       |      1|acc     |0.5400|±  |0.0501|
|                                                           |       |acc_norm|0.5400|±  |0.0501|
|harness:hendrycksTest-high_school_biology:5                |      1|acc     |0.8774|±  |0.0187|
|                                                           |       |acc_norm|0.8774|±  |0.0187|
|harness:hendrycksTest-high_school_chemistry:5              |      1|acc     |0.6552|±  |0.0334|
|                                                           |       |acc_norm|0.6552|±  |0.0334|
|harness:hendrycksTest-high_school_computer_science:5       |      1|acc     |0.8400|±  |0.0368|
|                                                           |       |acc_norm|0.8400|±  |0.0368|
|harness:hendrycksTest-high_school_european_history:5       |      1|acc     |0.8545|±  |0.0275|
|                                                           |       |acc_norm|0.8545|±  |0.0275|
|harness:hendrycksTest-high_school_geography:5              |      1|acc     |0.8990|±  |0.0215|
|                                                           |       |acc_norm|0.8990|±  |0.0215|
|harness:hendrycksTest-high_school_government_and_politics:5|      1|acc     |0.9637|±  |0.0135|
|                                                           |       |acc_norm|0.9637|±  |0.0135|
|harness:hendrycksTest-high_school_macroeconomics:5         |      1|acc     |0.7718|±  |0.0213|
|                                                           |       |acc_norm|0.7718|±  |0.0213|
|harness:hendrycksTest-high_school_mathematics:5            |      1|acc     |0.4259|±  |0.0301|
|                                                           |       |acc_norm|0.4259|±  |0.0301|
|harness:hendrycksTest-high_school_microeconomics:5         |      1|acc     |0.8403|±  |0.0238|
|                                                           |       |acc_norm|0.8403|±  |0.0238|
|harness:hendrycksTest-high_school_physics:5                |      1|acc     |0.5695|±  |0.0404|
|                                                           |       |acc_norm|0.5695|±  |0.0404|
|harness:hendrycksTest-high_school_psychology:5             |      1|acc     |0.9193|±  |0.0117|
|                                                           |       |acc_norm|0.9193|±  |0.0117|
|harness:hendrycksTest-high_school_statistics:5             |      1|acc     |0.6852|±  |0.0317|
|                                                           |       |acc_norm|0.6852|±  |0.0317|
|harness:hendrycksTest-high_school_us_history:5             |      1|acc     |0.9020|±  |0.0209|
|                                                           |       |acc_norm|0.9020|±  |0.0209|
|harness:hendrycksTest-high_school_world_history:5          |      1|acc     |0.9030|±  |0.0193|
|                                                           |       |acc_norm|0.9030|±  |0.0193|
|harness:hendrycksTest-human_aging:5                        |      1|acc     |0.8027|±  |0.0267|
|                                                           |       |acc_norm|0.8027|±  |0.0267|
|harness:hendrycksTest-human_sexuality:5                    |      1|acc     |0.8550|±  |0.0309|
|                                                           |       |acc_norm|0.8550|±  |0.0309|
|harness:hendrycksTest-international_law:5                  |      1|acc     |0.9256|±  |0.0240|
|                                                           |       |acc_norm|0.9256|±  |0.0240|
|harness:hendrycksTest-jurisprudence:5                      |      1|acc     |0.8704|±  |0.0325|
|                                                           |       |acc_norm|0.8704|±  |0.0325|
|harness:hendrycksTest-logical_fallacies:5                  |      1|acc     |0.8466|±  |0.0283|
|                                                           |       |acc_norm|0.8466|±  |0.0283|
|harness:hendrycksTest-machine_learning:5                   |      1|acc     |0.6875|±  |0.0440|
|                                                           |       |acc_norm|0.6875|±  |0.0440|
|harness:hendrycksTest-management:5                         |      1|acc     |0.8835|±  |0.0318|
|                                                           |       |acc_norm|0.8835|±  |0.0318|
|harness:hendrycksTest-marketing:5                          |      1|acc     |0.9316|±  |0.0165|
|                                                           |       |acc_norm|0.9316|±  |0.0165|
|harness:hendrycksTest-medical_genetics:5                   |      1|acc     |0.8000|±  |0.0402|
|                                                           |       |acc_norm|0.8000|±  |0.0402|
|harness:hendrycksTest-miscellaneous:5                      |      1|acc     |0.8902|±  |0.0112|
|                                                           |       |acc_norm|0.8902|±  |0.0112|
|harness:hendrycksTest-moral_disputes:5                     |      1|acc     |0.8497|±  |0.0192|
|                                                           |       |acc_norm|0.8497|±  |0.0192|
|harness:hendrycksTest-moral_scenarios:5                    |      1|acc     |0.7698|±  |0.0141|
|                                                           |       |acc_norm|0.7698|±  |0.0141|
|harness:hendrycksTest-nutrition:5                          |      1|acc     |0.8464|±  |0.0206|
|                                                           |       |acc_norm|0.8464|±  |0.0206|
|harness:hendrycksTest-philosophy:5                         |      1|acc     |0.8264|±  |0.0215|
|                                                           |       |acc_norm|0.8264|±  |0.0215|
|harness:hendrycksTest-prehistory:5                         |      1|acc     |0.8580|±  |0.0194|
|                                                           |       |acc_norm|0.8580|±  |0.0194|
|harness:hendrycksTest-professional_accounting:5            |      1|acc     |0.5887|±  |0.0294|
|                                                           |       |acc_norm|0.5887|±  |0.0294|
|harness:hendrycksTest-professional_law:5                   |      1|acc     |0.6069|±  |0.0125|
|                                                           |       |acc_norm|0.6069|±  |0.0125|
|harness:hendrycksTest-professional_medicine:5              |      1|acc     |0.8162|±  |0.0235|
|                                                           |       |acc_norm|0.8162|±  |0.0235|
|harness:hendrycksTest-professional_psychology:5            |      1|acc     |0.8317|±  |0.0151|
|                                                           |       |acc_norm|0.8317|±  |0.0151|
|harness:hendrycksTest-public_relations:5                   |      1|acc     |0.7545|±  |0.0412|
|                                                           |       |acc_norm|0.7545|±  |0.0412|
|harness:hendrycksTest-security_studies:5                   |      1|acc     |0.8286|±  |0.0241|
|                                                           |       |acc_norm|0.8286|±  |0.0241|
|harness:hendrycksTest-sociology:5                          |      1|acc     |0.9154|±  |0.0197|
|                                                           |       |acc_norm|0.9154|±  |0.0197|
|harness:hendrycksTest-us_foreign_policy:5                  |      1|acc     |0.9100|±  |0.0288|
|                                                           |       |acc_norm|0.9100|±  |0.0288|
|harness:hendrycksTest-virology:5                           |      1|acc     |0.5361|±  |0.0388|
|                                                           |       |acc_norm|0.5361|±  |0.0388|
|harness:hendrycksTest-world_religions:5                    |      1|acc     |0.8889|±  |0.0241|
|                                                           |       |acc_norm|0.8889|±  |0.0241|
|harness:truthfulqa:mc:0                                    |      1|mc1     |0.5300|±  |0.0175|
|                                                           |       |mc2     |0.6937|±  |0.0149|
|harness:winogrande:5                                       |      0|acc     |0.8556|±  |0.0099|
|harness:gsm8k:5                                            |      0|acc     |0.6785|±  |0.0129|
|all                                                        |      0|acc     |0.7582|±  |0.0285|
|                                                           |       |acc_norm|0.7616|±  |0.0291|
|                                                           |       |mc1     |0.5300|±  |0.0175|
|                                                           |       |mc2     |0.6937|±  |0.0149|
clefourrier changed discussion status to closed

Just reopening as requested, as I don't see the results on the leaderboard

clefourrier changed discussion status to open

Found it, I had to turn of a search setting.

dnhkng changed discussion status to closed

Sign up or log in to comment