tokenizer.model file

#10
by hanisaf - opened

I'm trying to convert the model to GGML. The tokenizer.model file is not included. Using LLAMA 2 tokenizer.model results in an error "Expected added token IDs to be sequential" I appreciate pointing to the tokenizer.model file

isn't it the original model?
model = AutoModelForCausalLM.from_pretrained("microsoft/phi-1_5")
tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-1_5")

correct, but I don't know how to save it into the tokenizer.model file required by the convert.py script.

correct, but I don't know how to save it into the tokenizer.model file required by the convert.py script.

This is not a llama model, so you cannot use llama.cpp or ggml to convert it yet. You will have to add support for MixFormerSequentialForCausalLM model type in the ggml library first.

Thanks a lot for this information and sorry for the newbie question. I appreciate any link or tutorial for doing so. A Google search led me to this page https://github.com/guidance-ai/guidance/issues/58 but I could not get much out of it.

Thanks a lot for this information and sorry for the newbie question. I appreciate any link or tutorial for doing so. A Google search led me to this page https://github.com/guidance-ai/guidance/issues/58 but I could not get much out of it.

It won't be easy porting the model to GGML, but not impossible. You can take a look at how GPT-J is implemented here (refer to the HF implementation here), then try and adapt the Phi model the same way. The modeling code for phi is in this repo.

gugarosa changed discussion status to closed

Sign up or log in to comment