convert token to the original form

#28
by DAIEF - opened

Hello, i am wondering if it is possible to convert the token words to their original form, such as the token 'solu' can be reconverted into 'solution'

Beijing Academy of Artificial Intelligence org

Hi, transformers model uses tokens instead of words as input. Sometimes, a single word may be segmented into multiple tokens by the tokenizer.

Thank you very much! I would like to know if you think it would be useful to use your model to evaluate whether the generated/extracted keywords are of high quality. For example, for each document in the document list, I generated/extracted a list of keywords and then embedded all the keywords and documents using the colbert matrix (maybe I could use 2 other matrices as well). Finally, I start calculating the score for each keyword and each document, and if the document source has the highest score with the keywords, I can say that this document has the best keyword.

What do you think of this approach? I look forward to your insights.

Beijing Academy of Artificial Intelligence org

It seems like a possible approach. However, we haven't done this type of task before and cannot offer more advice.

Sign up or log in to comment