open-llm-leaderboard/open_llm_leaderboard

alozowski

Open LLM Leaderboard org Apr 10

No description provided.

Updated init_space() mostlye34e3571

alozowski

Open LLM Leaderboard org Apr 12

•

edited Apr 12

Changes in app.py:

Most importantly, I split up init_space() into three functions for clarity and now we have download_dataset() function with logging. May improve this function later.
Got rid of unnecessary copy creation with copy().
init_space() now returns only leaderboard_df, raw_data, eval_queue_dfs.
plot_df is only created when it's needed with the new load_and_create_plots() function.

Changes in populate.py:

commented prints :D

Updated collections.py2293858b

alozowski

Open LLM Leaderboard org Apr 12

•

edited Apr 12

Changes in collections.py:

split up the update_collections() function into distinct parts for filtering data, adding models to the collection, and cleaning up the collection – this should improve readability and maintainability.
the conversion and filtering of the params_column are now done only once per model type and size, rather than multiple times.

Changes in populate.py:

deleted prints :DD

Updated populate.py6b9cbbe7

alozowski

Open LLM Leaderboard org Apr 12

Changes in populate.py:

introduced two helper functions _load_json_data and _process_model_data.
enhanced readability

clefourrier changed pull request status to open Apr 15

clefourrier

Open LLM Leaderboard org Apr 15

@Wauplin the CI does not seem to open the sub spaces - do we need to change stg in the code?

clefourrier

Open LLM Leaderboard org Apr 15

Following the creation of this PR, an ephemeral Space HuggingFaceH4/open_llm_leaderboard-ci-pr-671 has been started. Any changes pushed to this PR will be synced with the test Space.
Since this PR has been created by a trusted author, the ephemeral Space has been configured with the correct hardware, storage, and secrets.
(This is an automated message.)

Updated gitignore122c7afd

clefourrier

Open LLM Leaderboard org Apr 15

Following new commits that happened in this PR, the ephemeral Space HuggingFaceH4/open_llm_leaderboard-ci-pr-671 has been updated.
(This is an automated message.)

clefourrier

Open LLM Leaderboard org Apr 15

Nice changes so far!

you'll need to remove the ## UPDATED comments
# Data processing for plots now only on demand in the respective Gradio tab > very nice
get_evaluation_queue_df > I think you could manage the save logic in one step, by first creating the correct entries list (storing both files and subfolders files) , then applying processing.
a function such as _load_json_data could go in an util.py file
I think the following keys might be incorrect (not your fault - your code highlights it well) - cc @SaylorTwift

        "PENDING": ["PENDING", "RERUN"],
        "RUNNING": ["RUNNING"],
        "FINISHED": ["FINISHED", "PENDING_NEW_EVAL"],
    }

Wauplin

Open LLM Leaderboard org Apr 15

@Wauplin the CI does not seem to open the sub spaces - do we need to change stg in the code?

@clefourrier Looks like it now worked. Have you changed something? 🤔

clefourrier

Open LLM Leaderboard org Apr 15

•

edited Apr 15

Have you changed something? 🤔

I tagged you XD

clefourrier

Open LLM Leaderboard org Apr 15

Clearly this is magic and the CI recognizing their true master XD (could be a matter of publishing the branch too though ^^)

Wauplin

Open LLM Leaderboard org Apr 15

could be a matter of publishing the branch too though ^^

Good catch yes! Only PRs with status "open" are deployed (see here, here and here). Will update to get it right!

Clearly this is magic

I don't want it to be magiiiiic ! 😭

clefourrier

Open LLM Leaderboard org Apr 15

@alozowski for the CI to work, you'll need to add a small check after CACHE_PATH = os.getenv("HF_HOME", ".") in env to check if this is a path we have access to, else replace it by .

alozowski

Open LLM Leaderboard org Apr 15

•

edited Apr 15

Found a bug in my new app.py, the "private or deleted" button doesn't work – fixing

bugfix and populate refactoring2e74c814

clefourrier

Open LLM Leaderboard org Apr 16

Following new commits that happened in this PR, the ephemeral Space HuggingFaceH4/open_llm_leaderboard-ci-pr-671 has been updated.
(This is an automated message.)

alozowski

Open LLM Leaderboard org Apr 16

•

edited Apr 16

Changes in app.py:

the bug was in row 97 with update_collections, I didn't return both original_df and leaderboard_df.

Changes in envs.py:

added a check after CACHE_PATH.

Changes in populate.py:

get_evaluation_queue_df now handles save_path with pathlib.

Changes in utils.py:

now it contains load_json_data function.

I've finished the changes I wanted to make here and I'm ready to merge, please check @clefourrier . Btw, I can apply Makefile style before merge

updated utils.pyf073c676

removed comments from populate.py79ad1ade

clefourrier

Open LLM Leaderboard org Apr 16

Following new commits that happened in this PR, the ephemeral Space HuggingFaceH4/open_llm_leaderboard-ci-pr-671 has been updated.
(This is an automated message.)

clefourrier

Open LLM Leaderboard org Apr 16

Following new commits that happened in this PR, the ephemeral Space HuggingFaceH4/open_llm_leaderboard-ci-pr-671 has been updated.
(This is an automated message.)

clefourrier

Open LLM Leaderboard org Apr 16

•

edited Apr 16

Hi @alozowski :)

General comments

Looks good to merge once next items are fixed!
Not sure the check after CACHE_PATH is completely working, as the CI space is still broken for path reasons.
How did you test the collection udpate?

Specific code nits

app.py, line 80: the break should probably be removed as the space restart will leave the execution anyway
src/populate.py, line 33: nice idea to use rglob! Why not directly use '*.json' as pattern then? It will allow you to remove all checks
src/tools/collection.py, line 65: the [:10] is redundant since you define the number above already

fixing envs CACHE_PATH check63dac327

clefourrier

Open LLM Leaderboard org Apr 16

Following new commits that happened in this PR, the ephemeral Space HuggingFaceH4/open_llm_leaderboard-ci-pr-671 has been updated.
(This is an automated message.)

alozowski

Open LLM Leaderboard org Apr 16

•

edited Apr 16

Debugging this CACHE_PATH problem, can be lots of commits

debugging CACHE_PATH in envs.py6a5081fb

clefourrier

Open LLM Leaderboard org Apr 16

Following new commits that happened in this PR, the ephemeral Space HuggingFaceH4/open_llm_leaderboard-ci-pr-671 has been updated.
(This is an automated message.)

clefourrier

Open LLM Leaderboard org Apr 16

No problem!

debugging CACHE_PATH in envs.pye243a5f6

clefourrier

Open LLM Leaderboard org Apr 16

Following new commits that happened in this PR, the ephemeral Space HuggingFaceH4/open_llm_leaderboard-ci-pr-671 has been updated.
(This is an automated message.)

debugging CACHE_PATH in envs.py5a8f7dc9

clefourrier

Open LLM Leaderboard org Apr 16

Following new commits that happened in this PR, the ephemeral Space HuggingFaceH4/open_llm_leaderboard-ci-pr-671 has been updated.
(This is an automated message.)

alozowski

Open LLM Leaderboard org Apr 16

•

edited Apr 16

Why does CI pass the HF_HOME env variable to which it has no write permission? Changing the env variables during the runtime doesn't seem to be a good thing and I won't be able to fix it myself 😱

Is it possible to make CI pass the correct parameters? @clefourrier

clefourrier

Open LLM Leaderboard org Apr 16

The CI does not have the same hardware as the front end of the leaderboard - when you have attached permanent storage (like the leaderboard), it's mounted at "/data" - but here the CI space does not have this, hence needs to store things at "."

clefourrier

Open LLM Leaderboard org Apr 16

@Wauplin can we use different env vars for the CI?

small fixedd8bf61b2

clefourrier

Open LLM Leaderboard org Apr 16

Following new commits that happened in this PR, the ephemeral Space HuggingFaceH4/open_llm_leaderboard-ci-pr-671 has been updated.
(This is an automated message.)

alozowski

Open LLM Leaderboard org Apr 16

In the meantime, I changed the other parts:

If update_collections is broken or applied incorrectly, you will not be able to filter the models in the leaderboard table – but it's my manual testing so far + checking with prints before and after... I'm open to any ideas on how to create better tests for such changes
fixed app.py line 80
you're right @clefourrier , the "*.json" pattern in src/populate.py works well without other checks
fixed line 65 in src/tools/collections.py