|
--- |
|
license: gpl-3.0 |
|
tags: |
|
- typesense |
|
- semantic search |
|
- vector search |
|
--- |
|
|
|
# Typesense Built-in Embedding Models |
|
|
|
This repository holds all the built-in ML models supported by [Typesense](https://typesense.org) for semantic search currently. |
|
|
|
If you have a model that you would like to add to our supported list, you can convert it to the ONNX format and create a Pull Request (PR) to include it. (See below for instructions). |
|
|
|
## Usage |
|
|
|
Here's an example of how to specify the model to use for auto-embedding generation when creating a collection in Typesense: |
|
|
|
```bash |
|
curl -X POST \ |
|
'http://localhost:8108/collections' \ |
|
-H 'Content-Type: application/json' \ |
|
-H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \ |
|
-d '{ |
|
"name": "products", |
|
"fields": [ |
|
{ |
|
"name": "product_name", |
|
"type": "string" |
|
}, |
|
{ |
|
"name": "embedding", |
|
"type": "float[]", |
|
"embed": { |
|
"from": [ |
|
"product_name" |
|
], |
|
"model_config": { |
|
"model_name": "ts/all-MiniLM-L12-v2" |
|
} |
|
} |
|
} |
|
] |
|
}' |
|
``` |
|
|
|
Replace `all-MiniLM-L12-v2` with any model name from this repository. |
|
|
|
Here's a detailed step-by-step article with more information: https://typesense.org/docs/guide/semantic-search.html |
|
|
|
## Contributing |
|
|
|
If you have a model that you would like to add to our supported list, you can convert it to the ONNX format and create a Pull Request (PR) to include it. (See below for instructions). |
|
|
|
### Convert a model to ONNX format |
|
|
|
#### Converting a Hugging Face Transformers Model |
|
To convert any model from Hugging Face to ONNX format, you can follow the instructions in [this link](https://huggingface.co/docs/transformers/serialization#export-to-onnx) using the ```optimum-cli```. |
|
|
|
#### Converting a PyTorch Model |
|
If you have a PyTorch model, you can use the ```torch.onnx``` APIs to convert it to the ONNX format. More information on the conversion process can be found [here](https://pytorch.org/docs/stable/onnx.html). |
|
|
|
#### Converting a Tensorflow Model |
|
For Tensorflow models, you can utilize the tf2onnx tool to convert them to the ONNX format. Detailed guidance on this conversion can be found [here](https://onnxruntime.ai/docs/tutorials/tf-get-started.html#getting-started-converting-tensorflow-to-onnx). |
|
|
|
#### Creating model config |
|
|
|
Before submitting your ONNX model through a PR, you need to organize the necessary files under a folder with the model's name. Ensure that your model configuration adheres to the following structure: |
|
|
|
- **Model File**: The ONNX model file. |
|
- **Vocab File**: The vocabulary file required for the model. |
|
- **Model Config File**: Named as config.json, this file should contain the following keys: |
|
| Key | Description | Optional | |
|
|-----|-------------|----------| |
|
|model_md5| MD5 checksum of model file as string| No | |
|
|vocab_md5| MD5 checksum of vocab file as string| No | |
|
|model_type| Model type (currently only ```bert``` and ```xlm_roberta``` supported)| No | |
|
|vocab_file_name| File name of vocab file| No | |
|
|indexing_prefix| Prefix to be added before embedding documents| Yes | |
|
|query_prefix| Prefix to be added before embedding queries | Yes | |
|
|
|
|
|
Please make sure that the information in the configuration file is accurate and complete before submitting your PR. |
|
|
|
We appreciate your contributions to expand our collection of supported embedding models! |