How to include POS Tagging

#1
by efiore - opened

I have a Fine Tuned Model, but I still have not understood how I can feed into the model POS tagging. Could You please help?
Thanks a lot!

efiore changed discussion status to closed
efiore changed discussion status to open
Arabic Language Technologies, Qatar Computing Research Institute org

You can use a TokenClassification pipeline to feed in a sentence and get POS predictions per token. Does the snippet in the README not work for you?

I have to feed into the model (Hebrew) POS info before I can ask the model to predict POS
But I do not know how to feed into the model POS information

Arabic Language Technologies, Qatar Computing Research Institute org

It looks like you question is more geared towards training (finetuning) a POS model rather than using this specific model? You should read up the general transformer docs and check out the examples for that (for instance https://github.com/huggingface/transformers/tree/main/examples/pytorch/token-classification).

fdalvi changed discussion status to closed

Thanks a lot for You help!
One more question. The script You indicated seems to be made to "predict" POS.
What if I would like the model to predict a specific word (MLM) by also taking into consideration the POS information (I have a dataset what every word has a lot of linguistic information associated to it)
I would like the model to predict the MASKED word taking into consideration non only the single words (or tokens) but also all the linguistic information associated to each word.
I hope I am able to clearly explain what I need.
In case it is not clear, would it be possible to meet online?
Thanks again for You precious help

Arabic Language Technologies, Qatar Computing Research Institute org

I understand what you are trying to do, you want to use the POS information as an additional input with the original words. I have not worked a lot in this, but take a look at multi-source/multi-input models (you should be able to find a lot of papers, especially from pre-transformers era) The idea would be the same, where you have multiple inputs that combine into the same network after the embedding layer for example. You can also try and just provide your input as token_0 token_1 token_2 [MASK] ... token_n [SEP] pos_0 pos_1 pos_2 [MASK] ... pos_n as a rudimentary baseline (this is how NLP entailment tasks are done, so you can search for those as well).

Good luck!

Thanks a lot!

Sign up or log in to comment