Token Classification
GLiNER
PyTorch

False positives

#2
by abpani1994 - opened

Just to inform about the finetuning
sometimes it is extracting details like this
"type": "Email",
"matches": [
{
"value": "direct email",
},
{
"value": "email client",
},
{
"value": "email address",
}

So I feel the data you fine tuned on has a lot of positive examples.
{'start': 0,
'end': 11,
'text': 'Partnership',
'label': 'organization',
'score': 0.8815939426422119},
{'start': 35,
'end': 46,
'text': 'Partnership',
'label': 'organization',
'score': 0.6203173995018005},
{'start': 255,
'end': 263,
'text': 'Investor',
'label': 'person',
'score': 0.7565749287605286},
{'start': 300,
'end': 308,
'text': 'Investor',
'label': 'person',
'score': 0.6845653653144836},

Department for Artificial Intelligence, Jožef Stefan Institute org

Thank you for your input.

We fine-tuned the model as per GLiNER’s provided example for fine-tuning their models. As per all NER models, it is expected that it will (sometimes) extract values that might not be inline with your expectations.

We suggest you to try a different (higher) threshold to remove such examples - this will also let the model know that you want to extract the values only if the model is really certain about it. Furthermore, you can try initializing multiple models for different entities and set the threshold for each one.

Hope this helps.

eriknovak changed discussion status to closed

Sign up or log in to comment