Align task name and type with Hub taxonomy #3

by lewtun HF staff - opened

This PR proposes to align task name and type for the self-reported evaluation with the Hub taxonomy (i.e. the high-level tasks defined in

The self-reported results will then become visible on this PwC leaderboard:

cc @julien-c

philschmid changed pull request status to merged

why don't you just group all the metrics into the same (task, dataset) tuple, then? would be cleaner, no?

Yes it would be cleaner that way, but self-reported evaluations rarely specify the dataset config / split that was used. This means you can't group the verified and self-reported metrics under a single dataset field.

A unique grouping would be something like (task, dataset_id, dataset_config, dataset_split) - I'll double check if the metadata_update() function from huggingface_hub that we use automatically groups along those fields

