autoevaluate/model-evaluator · "already evaluated" Bug?

Jun 13, 2022

•

edited Jun 13, 2022

It looks like, when I select a dataset and model that has already been evaluated, I get a message that says that the evaluation has already happened. I think that this is generally a good thing to do. But I get this message even if the metrics I select are different than last time. Is this intentional?

lewtun

Evaluation on the Hub org Jun 14, 2022

Great question! The current hash I use for determining whether a model has been evaluated is based on:

The task
The model ID
The dataset name
The dataset config
The dataset split

So in your case, the model is considered to be "evaluated" even if the user-specific metrics differ across jobs.

I think the natural fix here is to include a list of metrics to the hash. However, this adds some complexity because we will end up having Hub PRs with duplicate evaluation info ( i.e. the default metrics like accuracy and F1 score will be appended to the model card twice.)

So to make this clean, I think we'd need a mechanism in the backend that:

Checks which metrics have already been computed in previous jobs
Filters for just the new metrics

cc @abhishek

lewtun

Evaluation on the Hub org Jun 28, 2022

We now include the list of user metrics in the hash to determine previously evaluated models. This will catch the majority of duplicate jobs, but extra work is needed in the backend to ensure cases like:

User A selects metric X
User B selects metric Y
User C selects metrics [X,Y]

In the current approach, the last scenario will be treated as a new evaluation vs the previous two => duplicate metrics.