Token Classification
GLiNER
PyTorch
multilingual

Documentation on labels ?

#10
by LPN64 - opened

Hello, is there a specialized documentaions on the labels.

For example on the russian text you use labels = ["Drugname", "Drugform"]

Why not "drug name", does it work in Russian, does it has to be done in english ? Does it work better when it's used in same language as text language ?

Let's say I want to detects "events", but it fails at detecting events names like "Olympic games Paris 2024", In a LLM I could give example, here I can't, can i just put something like event evenement as evenement is a synonym ?

Thanks for the tool.

Owner

you can use "event". I also suggest you to decrease the value of the threshold to 0.4 or 0.3 to increase recall, as the model may not be well calibrated for unfrequent domains

Thanks for the quick answer, i'll give a try tomorrow at the office.

What about the other points mentioned above about the language of the labels ? And about multi-label in one string ? bad idea ? never tried ?

Owner

What about the other points mentioned above about the language of the labels ?

I don't really have an answer to this question as the model was only finetuned on English. You should try it by yourself to see what works best

And about multi-label in one string ?

It should be possible but requires a special decoding. Fo now I have only implemented flat ner and nested ner decoding

Alright, thanks !

Sign up or log in to comment