Add evaluation results on conll2003 dataset

by autoevaluator HF staff - opened Jun 24, 2022

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

+41

-13

autoevaluator

Jun 24, 2022

Beep boop, I am a bot from Hugging Face's automatic model evaluator 👋!
Your model has been evaluated on the conll2003 dataset by @douwekiela , using the predictions stored here.
Accept this pull request to see the results displayed on the Hub leaderboard.
Evaluate your model on more datasets here.

Add evaluation results on conll2003 datasetd97045e2

philschmid

Owner Jun 24, 2022

@douwekiela @lewtun . QQ: Why is the f1 so much higher than on when i evaluated?

lewtun

Jun 24, 2022

Good question :) This job was run on the validation split, so maybe the results you have were reported on the test split instead?

philschmid

Owner Jun 24, 2022

on the model card there are results for validation and test

lewtun

Jun 24, 2022

Hmm interesting ... this might warrant a manual verification to identify the source of the discrepancy (it will be really bad if there's a bug that skews all "verified" evaluations!)

cc @abhishek who might have done similar comparisons before

abhishek

Jun 24, 2022

could you please share the evaluation function you used @philschmid ? :)

philschmid

Owner Jun 24, 2022

thats the script i used for training evaluation: https://github.com/philschmid/distilroberta-token-classification/blob/master/src/training/train.py

lewtun

Jun 24, 2022

😍

philschmid changed pull request status to merged Jun 24, 2022

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment