how did you convert `transformers.PreTrainedTokenizer` to ggml format?

#2
by keunwoochoi - opened

can you share how you did it? i am trying to use my custom language model to ggml. but i also use a tokenizers.Tokenizer that i trained on my corpus.
i could get merges.txt and vocab.json, but idk how i can convert it to tokenizer.model file, which seems like the only format the ggml converter is compatible with.

thanks!

You need to add support of your model architecture into ggml - see https://github.com/ggerganov/ggml/tree/master/examples
There is no magical recipe. You also can see https://github.com/OpenNMT/CTranslate2 as an alternative.

Sign up or log in to comment