--- license: gpl-3.0 tags: - typesense - semantic search - vector search --- # Typesense Built-in Embedding Models This repository holds all the built-in ML models supported by [Typesense](https://typesense.org) for semantic search currently. If you have a model that you would like to add to our supported list, you can convert it to the ONNX format and create a Pull Request (PR) to include it. (See below for instructions). ## Usage Here's an example of how to specify the model to use for auto-embedding generation when creating a collection in Typesense: ```bash curl -X POST \ 'http://localhost:8108/collections' \ -H 'Content-Type: application/json' \ -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \ -d '{ "name": "products", "fields": [ { "name": "product_name", "type": "string" }, { "name": "embedding", "type": "float[]", "embed": { "from": [ "product_name" ], "model_config": { "model_name": "ts/all-MiniLM-L12-v2" } } } ] }' ``` Replace `all-MiniLM-L12-v2` with any model name from this repository. Here's a detailed step-by-step article with more information: https://typesense.org/docs/guide/semantic-search.html ## Contributing If you have a model that you would like to add to our supported list, you can convert it to the ONNX format and create a Pull Request (PR) to include it. (See below for instructions). ### Convert a model to ONNX format #### Converting a Hugging Face Transformers Model To convert any model from Hugging Face to ONNX format, you can follow the instructions in [this link](https://huggingface.co/docs/transformers/serialization#export-to-onnx) using the ```optimum-cli```. #### Converting a PyTorch Model If you have a PyTorch model, you can use the ```torch.onnx``` APIs to convert it to the ONNX format. More information on the conversion process can be found [here](https://pytorch.org/docs/stable/onnx.html). #### Converting a Tensorflow Model For Tensorflow models, you can utilize the tf2onnx tool to convert them to the ONNX format. Detailed guidance on this conversion can be found [here](https://onnxruntime.ai/docs/tutorials/tf-get-started.html#getting-started-converting-tensorflow-to-onnx). #### Creating model config Before submitting your ONNX model through a PR, you need to organize the necessary files under a folder with the model's name. Ensure that your model configuration adheres to the following structure: - **Model File**: The ONNX model file. - **Vocab File**: The vocabulary file required for the model. - **Model Config File**: Named as config.json, this file should contain the following keys: | Key | Description | Optional | |-----|-------------|----------| |model_md5| MD5 checksum of model file as string| No | |vocab_md5| MD5 checksum of vocab file as string| No | |model_type| Model type (currently only ```bert``` and ```xlm_roberta``` supported)| No | |vocab_file_name| File name of vocab file| No | |indexing_prefix| Prefix to be added before embedding documents| Yes | |query_prefix| Prefix to be added before embedding queries | Yes | Please make sure that the information in the configuration file is accurate and complete before submitting your PR. We appreciate your contributions to expand our collection of supported embedding models!