dataframe-improvement

#671
Hugging Face H4 org
No description provided.
Hugging Face H4 org
edited 19 days ago

Changes in app.py:

  • Most importantly, I split up init_space() into three functions for clarity and now we have download_dataset() function with logging. May improve this function later.
  • Got rid of unnecessary copy creation with copy().
  • init_space() now returns only leaderboard_df, raw_data, eval_queue_dfs.
  • plot_df is only created when it's needed with the new load_and_create_plots() function.

Changes in populate.py:

  • commented prints :D
Hugging Face H4 org
edited 19 days ago

Changes in collections.py:

  • split up the update_collections() function into distinct parts for filtering data, adding models to the collection, and cleaning up the collection – this should improve readability and maintainability.
  • the conversion and filtering of the params_column are now done only once per model type and size, rather than multiple times.

Changes in populate.py:

  • deleted prints :DD
Hugging Face H4 org

Changes in populate.py:

  • introduced two helper functions _load_json_data and _process_model_data.
  • enhanced readability
clefourrier changed pull request status to open
Hugging Face H4 org

@Wauplin the CI does not seem to open the sub spaces - do we need to change stg in the code?

Hugging Face H4 org

Following the creation of this PR, an ephemeral Space HuggingFaceH4/open_llm_leaderboard-ci-pr-671 has been started. Any changes pushed to this PR will be synced with the test Space.
Since this PR has been created by a trusted author, the ephemeral Space has been configured with the correct hardware, storage, and secrets.
(This is an automated message.)

Hugging Face H4 org

Following new commits that happened in this PR, the ephemeral Space HuggingFaceH4/open_llm_leaderboard-ci-pr-671 has been updated.
(This is an automated message.)

Hugging Face H4 org

Nice changes so far!

  • you'll need to remove the ## UPDATED comments
  • # Data processing for plots now only on demand in the respective Gradio tab > very nice
  • get_evaluation_queue_df > I think you could manage the save logic in one step, by first creating the correct entries list (storing both files and subfolders files) , then applying processing.
  • a function such as _load_json_data could go in an util.py file
  • I think the following keys might be incorrect (not your fault - your code highlights it well) - cc @SaylorTwift
        "PENDING": ["PENDING", "RERUN"],
        "RUNNING": ["RUNNING"],
        "FINISHED": ["FINISHED", "PENDING_NEW_EVAL"],
    }
Hugging Face H4 org

@Wauplin the CI does not seem to open the sub spaces - do we need to change stg in the code?

@clefourrier Looks like it now worked. Have you changed something? 🤔

Hugging Face H4 org
edited 17 days ago

Have you changed something? 🤔

I tagged you XD

Hugging Face H4 org

Clearly this is magic and the CI recognizing their true master XD (could be a matter of publishing the branch too though ^^)

Hugging Face H4 org

could be a matter of publishing the branch too though ^^

Good catch yes! Only PRs with status "open" are deployed (see here, here and here). Will update to get it right!

Clearly this is magic

I don't want it to be magiiiiic ! 😭

Hugging Face H4 org

@alozowski for the CI to work, you'll need to add a small check after CACHE_PATH = os.getenv("HF_HOME", ".") in env to check if this is a path we have access to, else replace it by .

Hugging Face H4 org
edited 16 days ago

Found a bug in my new app.py, the "private or deleted" button doesn't work – fixing

Hugging Face H4 org

Following new commits that happened in this PR, the ephemeral Space HuggingFaceH4/open_llm_leaderboard-ci-pr-671 has been updated.
(This is an automated message.)

Hugging Face H4 org
edited 16 days ago

Changes in app.py:

  • the bug was in row 97 with update_collections, I didn't return both original_df and leaderboard_df.

Changes in envs.py:

  • added a check after CACHE_PATH.

Changes in populate.py:

  • get_evaluation_queue_df now handles save_path with pathlib.

Changes in utils.py:

  • now it contains load_json_data function.

I've finished the changes I wanted to make here and I'm ready to merge, please check @clefourrier . Btw, I can apply Makefile style before merge

Hugging Face H4 org

Following new commits that happened in this PR, the ephemeral Space HuggingFaceH4/open_llm_leaderboard-ci-pr-671 has been updated.
(This is an automated message.)

Hugging Face H4 org

Following new commits that happened in this PR, the ephemeral Space HuggingFaceH4/open_llm_leaderboard-ci-pr-671 has been updated.
(This is an automated message.)

Hugging Face H4 org
edited 16 days ago

Hi @alozowski :)

General comments

  • Looks good to merge once next items are fixed!
  • Not sure the check after CACHE_PATH is completely working, as the CI space is still broken for path reasons.
  • How did you test the collection udpate?

Specific code nits

  • app.py, line 80: the break should probably be removed as the space restart will leave the execution anyway
  • src/populate.py, line 33: nice idea to use rglob! Why not directly use '*.json' as pattern then? It will allow you to remove all checks
  • src/tools/collection.py, line 65: the [:10] is redundant since you define the number above already
Hugging Face H4 org

Following new commits that happened in this PR, the ephemeral Space HuggingFaceH4/open_llm_leaderboard-ci-pr-671 has been updated.
(This is an automated message.)

Hugging Face H4 org
edited 15 days ago

Debugging this CACHE_PATH problem, can be lots of commits

Hugging Face H4 org

Following new commits that happened in this PR, the ephemeral Space HuggingFaceH4/open_llm_leaderboard-ci-pr-671 has been updated.
(This is an automated message.)

Hugging Face H4 org

No problem!

Hugging Face H4 org

Following new commits that happened in this PR, the ephemeral Space HuggingFaceH4/open_llm_leaderboard-ci-pr-671 has been updated.
(This is an automated message.)

Hugging Face H4 org

Following new commits that happened in this PR, the ephemeral Space HuggingFaceH4/open_llm_leaderboard-ci-pr-671 has been updated.
(This is an automated message.)

Hugging Face H4 org
edited 15 days ago

Why does CI pass the HF_HOME env variable to which it has no write permission? Changing the env variables during the runtime doesn't seem to be a good thing and I won't be able to fix it myself 😱

Is it possible to make CI pass the correct parameters? @clefourrier

Hugging Face H4 org

The CI does not have the same hardware as the front end of the leaderboard - when you have attached permanent storage (like the leaderboard), it's mounted at "/data" - but here the CI space does not have this, hence needs to store things at "."

Hugging Face H4 org

@Wauplin can we use different env vars for the CI?

Hugging Face H4 org

Following new commits that happened in this PR, the ephemeral Space HuggingFaceH4/open_llm_leaderboard-ci-pr-671 has been updated.
(This is an automated message.)

Hugging Face H4 org

In the meantime, I changed the other parts:

  • If update_collections is broken or applied incorrectly, you will not be able to filter the models in the leaderboard table – but it's my manual testing so far + checking with prints before and after... I'm open to any ideas on how to create better tests for such changes
  • fixed app.py line 80
  • you're right @clefourrier , the "*.json" pattern in src/populate.py works well without other checks
  • fixed line 65 in src/tools/collections.py

p.s. the name of the last commit should've been "small fixes", a typo :DD

Hugging Face H4 org

update_collection should not be related to the filtering in the table - if it fails, it should log it but not affect the rest of the space

Hugging Face H4 org

Otherwise, looks good to me! Merging!

clefourrier changed pull request status to merged
Hugging Face H4 org

PR is now merged/closed. The ephemeral Space has been deleted.
(This is an automated message.)

Hugging Face H4 org

It's way 'deeper' with update_collection as I investigated... I'll explain later /mysteriously leaves/ :D

Thanks for the merge! 🤩

Sign up or log in to comment