There's still problems with size filters

#327
by spacecowgoesmoo - opened

1
There's no longer any way to filter for only 7Bs. According to the #Params column, most 7B's are slightly above or below having exactly 7 billion parameters. 7.24, 7.11, 6.99, 6.74, etc. Creating the new 3-7 and 7-13 filters has effectively cut this category in half, and any complete list of 7Bs is now also filled with 3Bs and 13Bs.

I get why this change was made since there's more variety in model sizes now, but you could cover both these issues by using slightly offset filter ranges. 4-8, 8-15, 15-40, something like that. The ranges should change at points where there's the fewest models so they don't split up existing categories.

2
If you click multiple filter checkboxes without waiting for the first one to finish processing, the filters don’t work correctly. You can test this by quickly unclicking 60+, 35-60, and 13-35. Large models will still be visible even though it should only be showing up to 13B now.

3
The leaderboard shouldn’t be making network calls when using a filter. Fetch all the LLM data on pageload, store it locally, and then query the local data instead. Column sorting was already updated to work this way and it should reduce the filter processing time to zero. If you're worried about people leaving their tabs open forever and seeing outdated data, you can also add a javascript timer to automatically refresh the page every 24 hours.

Open LLM Leaderboard org

Yes, I'm currently modifying the range of model sizes to better match standard model sizes.
I am not sure to see what you mean by network calls, we have a local dataframe containing all the models and use gradio to call a filter function that returns a filtered dataframe. Can you elaborate on this ?

I don't know exactly how the leaderboard works, but this happens whenever you click on a filter checkbox (of any kind, not just Param size). There's some kind of network request being made each time, which would explain the processing delays. Ideally a filter action would be handled in a local javascript data structure so nothing would appear in the Network tab here. In this screenshot, I changed the 60+ model size checkbox from off to on.

Screenshot 2023-10-15 at 7.59.22 AM.png

In one of these data packets (the second one with the green arrow), it looks like the leaderboard may be downloading 1.16MB of new table rows from a remote server as HTML, which isn't good. It'd be faster to query a local copy of the LLM data, which you should have already had to download on pageload. Then just apply style='display: none;' to whatever rows don't match the current filters. The table doesn't need to be rebuilt each time.

Also 30% of that HTML data is duplicated inline CSS. Something like this should be handled in a global CSS file so it doesn't need to be repeated hundreds of times.

Screenshot 2023-10-15 at 8.24.07 AM.png

Thought I would comment here instead of opening a new issue since it seems related. I noticed when you deselect any filter when the page first loads some of the models disappear. If you sort by HellaSwag when the page first loads the pre-trained Falcon-180B models are at the top. If you deselect a filter that's not even relevant, like "?" (unknown) size, those models disappear. Nowhere in the list do they appear. Same for something as simple as filtering columns. Even if you reselect what you deselected the models don't reappear until you reload/refresh the page.

Open LLM Leaderboard org
edited Oct 16, 2023

@spacecowgoesmoo I hear you, I think it is simply due to the way Gradio functions. The gradio app runs on a remote server, and you query the app every time you want to make a change to its state (by checking a box, for example). Unfortunately, there isn't much we can do, but I reported the issue to the Gradio team.

Open LLM Leaderboard org

Re point 1, I changed the filter sizes to better match what is expected. As mentionned by @SaylorTwift , point 3 is not something we can solve on our end. I'll investigate point 2, but as it's not blocking, I'm closing the issue.
Thank you for your comments! :)

clefourrier changed discussion status to closed

Sign up or log in to comment