Add evaluation results on conll2003 dataset #1

by autoevaluator HF staff - opened

Beep boop, I am a bot from Hugging Face's automatic model evaluator πŸ‘‹!
Your model has been evaluated on the conll2003 dataset by @douwekiela, using the predictions stored here.
Accept this pull request to see the results displayed on the Hub leaderboard.
Evaluate your model on more datasets here.

@douwekiela @lewtun. QQ: Why is the f1 so much higher than on when i evaluated?

Good question :) This job was run on the validation split, so maybe the results you have were reported on the test split instead?

on the model card there are results for validation and test

Hmm interesting ... this might warrant a manual verification to identify the source of the discrepancy (it will be really bad if there's a bug that skews all "verified" evaluations!)

cc @abhishek who might have done similar comparisons before

could you please share the evaluation function you used @philschmid ? :)


philschmid changed pull request status to merged

Sign up or log in to comment