File size: 3,491 Bytes
8687cf1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e143fda
 
 
 
048df05
8687cf1
e143fda
048df05
8687cf1
e143fda
048df05
e143fda
8687cf1
 
048df05
 
 
 
 
e143fda
 
 
 
 
 
 
 
048df05
 
 
 
8687cf1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
---
license: gpl-3.0
tags:
- typesense
- semantic search
- vector search
---

# Typesense Built-in Embedding Models

This repository holds all the built-in ML models supported by [Typesense](https://typesense.org) for semantic search currently. 

If you have a model that you would like to add to our supported list, you can convert it to the ONNX format and create a Pull Request (PR) to include it. (See below for instructions).

## Usage

Here's an example of how to specify the model to use for auto-embedding generation when creating a collection in Typesense:

```bash
curl -X POST \
  'http://localhost:8108/collections' \
  -H 'Content-Type: application/json' \
  -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
  -d '{
        "name": "products",
        "fields": [
          {
            "name": "product_name",
            "type": "string"
          },
          {
            "name": "embedding",
            "type": "float[]",
            "embed": {
              "from": [
                "product_name"
              ],
              "model_config": {
                "model_name": "ts/all-MiniLM-L12-v2"
              }
            }
          }
        ]
      }'
```

Replace `all-MiniLM-L12-v2` with any model name from this repository. 

Here's a detailed step-by-step article with more information: https://typesense.org/docs/guide/semantic-search.html

## Contributing

If you have a model that you would like to add to our supported list, you can convert it to the ONNX format and create a Pull Request (PR) to include it. (See below for instructions).

### Convert a model to ONNX format

#### Converting a Hugging Face Transformers Model
To convert any model from Hugging Face to ONNX format, you can follow the instructions in [this link](https://huggingface.co/docs/transformers/serialization#export-to-onnx) using the ```optimum-cli```.

#### Converting a PyTorch Model
If you have a PyTorch model, you can use the ```torch.onnx``` APIs to convert it to the ONNX format. More information on the conversion process can be found  [here](https://pytorch.org/docs/stable/onnx.html).

#### Converting a Tensorflow Model
For Tensorflow models, you can utilize the tf2onnx tool to convert them to the ONNX format. Detailed guidance on this conversion can be found [here](https://onnxruntime.ai/docs/tutorials/tf-get-started.html#getting-started-converting-tensorflow-to-onnx). 

#### Creating model config 

Before submitting your ONNX model through a PR, you need to organize the necessary files under a folder with the model's name. Ensure that your model configuration adheres to the following structure:

  - **Model File**: The ONNX model file.
  - **Vocab File**: The vocabulary file required for the model.
  - **Model Config File**: Named as config.json, this file should contain the following keys:
| Key | Description | Optional |
|-----|-------------|----------|
|model_md5| MD5 checksum of model file as string| No |
|vocab_md5| MD5 checksum of vocab file as string| No |
|model_type| Model type (currently only ```bert``` and ```xlm_roberta``` supported)| No |
|vocab_file_name| File name of vocab file| No |
|indexing_prefix| Prefix to be added before embedding documents| Yes |
|query_prefix| Prefix to be added before embedding queries | Yes |


Please make sure that the information in the configuration file is accurate and complete before submitting your PR.

We appreciate your contributions to expand our collection of supported embedding models!