Words cannot be merged when tokenization
#2
by
gaetokk
- opened
First of all, thank you so much for your contributions and sharing.
I think I found an issue when I am trying to use the model you trained.
Unlike the original phi-2 model, the word doesn't get converted to a single token id even though it exists in merges.txt. I think it could be inefficient to train the model. I’m wondering if this is something you intended or if it's just an issue.
Here is the code I tried to run.
Thank you.