Typesense Public Embedding Models
We maintain a repository of currently supported embedding models, and we welcome contributions from the community. If you have a model that you would like to add to our supported list, you can convert it to the ONNX format and create a Pull Request (PR) to include it.
Convert a model to ONNX format
Converting a Hugging Face Transformers Model
To convert any model from Hugging Face to ONNX format, you can follow the instructions in this link using the optimum-cli
.
Converting a PyTorch Model
If you have a PyTorch model, you can use the torch.onnx
APIs to convert it to the ONNX format. More information on the conversion process can be found here.
Converting a Tensorflow Model
For Tensorflow models, you can utilize the tf2onnx tool to convert them to the ONNX format. Detailed guidance on this conversion can be found here.
Creating model config
Before submitting your ONNX model through a PR, you need to organize the necessary files under a folder with the model's name. Ensure that your model configuration adheres to the following structure:
- Model File: The ONNX model file.
- Vocab File: The vocabulary file required for the model.
- Model Config File: Named as config.json, this file should contain the following keys:
Key Description Optional model_md5 MD5 checksum of model file as string No vocab_md5 MD5 checksum of vocab file as string No model_type Model type (currently only bert
andxlm_roberta
supported)No vocab_file_name File name of vocab file No indexing_prefix Prefix to be added before embedding documents Yes query_prefix Prefix to be added before embedding queries Yes
Please make sure that the information in the configuration file is accurate and complete before submitting your PR.
We appreciate your contributions to expand our collection of supported embedding models!