File size: 3,491 Bytes
8687cf1 e143fda 048df05 8687cf1 e143fda 048df05 8687cf1 e143fda 048df05 e143fda 8687cf1 048df05 e143fda 048df05 8687cf1 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 |
---
license: gpl-3.0
tags:
- typesense
- semantic search
- vector search
---
# Typesense Built-in Embedding Models
This repository holds all the built-in ML models supported by [Typesense](https://typesense.org) for semantic search currently.
If you have a model that you would like to add to our supported list, you can convert it to the ONNX format and create a Pull Request (PR) to include it. (See below for instructions).
## Usage
Here's an example of how to specify the model to use for auto-embedding generation when creating a collection in Typesense:
```bash
curl -X POST \
'http://localhost:8108/collections' \
-H 'Content-Type: application/json' \
-H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
-d '{
"name": "products",
"fields": [
{
"name": "product_name",
"type": "string"
},
{
"name": "embedding",
"type": "float[]",
"embed": {
"from": [
"product_name"
],
"model_config": {
"model_name": "ts/all-MiniLM-L12-v2"
}
}
}
]
}'
```
Replace `all-MiniLM-L12-v2` with any model name from this repository.
Here's a detailed step-by-step article with more information: https://typesense.org/docs/guide/semantic-search.html
## Contributing
If you have a model that you would like to add to our supported list, you can convert it to the ONNX format and create a Pull Request (PR) to include it. (See below for instructions).
### Convert a model to ONNX format
#### Converting a Hugging Face Transformers Model
To convert any model from Hugging Face to ONNX format, you can follow the instructions in [this link](https://huggingface.co/docs/transformers/serialization#export-to-onnx) using the ```optimum-cli```.
#### Converting a PyTorch Model
If you have a PyTorch model, you can use the ```torch.onnx``` APIs to convert it to the ONNX format. More information on the conversion process can be found [here](https://pytorch.org/docs/stable/onnx.html).
#### Converting a Tensorflow Model
For Tensorflow models, you can utilize the tf2onnx tool to convert them to the ONNX format. Detailed guidance on this conversion can be found [here](https://onnxruntime.ai/docs/tutorials/tf-get-started.html#getting-started-converting-tensorflow-to-onnx).
#### Creating model config
Before submitting your ONNX model through a PR, you need to organize the necessary files under a folder with the model's name. Ensure that your model configuration adheres to the following structure:
- **Model File**: The ONNX model file.
- **Vocab File**: The vocabulary file required for the model.
- **Model Config File**: Named as config.json, this file should contain the following keys:
| Key | Description | Optional |
|-----|-------------|----------|
|model_md5| MD5 checksum of model file as string| No |
|vocab_md5| MD5 checksum of vocab file as string| No |
|model_type| Model type (currently only ```bert``` and ```xlm_roberta``` supported)| No |
|vocab_file_name| File name of vocab file| No |
|indexing_prefix| Prefix to be added before embedding documents| Yes |
|query_prefix| Prefix to be added before embedding queries | Yes |
Please make sure that the information in the configuration file is accurate and complete before submitting your PR.
We appreciate your contributions to expand our collection of supported embedding models! |