Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
Identifying flagged datasets
#723
by
jerome-white
- opened
What's the best way to identify datasets that are flagged? In the leaderboard's "about" tab, it says such datasets will have augmented names:
If a model's name contains "Flagged", this indicates it has been flagged by the community, and should probably be ignored! Clicking the link will redirect you to the discussion about the model.
However, none of the datasets seem to have that convention:
In [1]: from huggingface_hub import HfApi
In [2]: api = HfApi()
In [3]: ls = api.list_datasets(author='open-llm-leaderboard', search='details_')
In [4]: sum(filter(lambda x: x.id.casefold().find('flagged') >=0 , ls))
Out[4]: 0
Even though some seem to be flagged. For example, based on my read of filter_models.py
:
In [5]: ls = api.list_datasets(author='open-llm-leaderboard', search='details_')
...: for i in ls:
...: if i.id.find('Voicelab') >= 0 and i.id.endswith('trurl-2-13b'):
...: print(i)
...:
DatasetInfo(id='open-llm-leaderboard/details_Voicelab__trurl-2-13b', author='open-llm-leaderboard', sha='35959ef2290eadcc4110cc93990998f4ccbd95b1', created_at=datetime.datetime(2023, 8, 18, 18, 56, 24, tzinfo=datetime.timezone.utc), last_modified=datetime.datetime(2023, 10, 13, 14, 1, 42, tzinfo=datetime.timezone.utc), private=False, gated=False, disabled=False, downloads=21, likes=0, paperswithcode_id=None, tags=['region:us'], card_data=None, siblings=None)
The leaderboard has an option for toggling flagged datasets -- how is that compiled?
Hi!
Just to clarify one thing, it's not datasets but models that are flagged :)
This is actually a list which is harcoded here (depending on user reports) and parsed at table creation.
clefourrier
changed discussion status to
closed