Do you have guide to convert this to GGUF/GGML format?

#4
by qhkm - opened

Hi there! This is super cool! Saw it on twitter and was amazed by the performance as compared to sentence embedding. Would love to be able to use this in llama cpp so would need to convert to gguf to be able to use it. Do you have any idea how to do that? Thanks!

I would not recommend using this with Llama.cpp. It's a BERT model, so I looked into BERT.cpp but I don't really see the benefits of that over ONNX. I provided ONNX checkpoints so you should just use those. Many of the benefits of using Llama.cpp are more relevant to text generation, not so much for embeddings.

Sign up or log in to comment