typesense
/

models

semantic search

Model card Files Files and versions Community

models / README.md

jasonbosco's picture

Update README.md

8687cf1 10 months ago

|

history blame contribute delete

No virus

3.49 kB

	---
	license: gpl-3.0
	tags:
	- typesense
	- semantic search
	- vector search
	---

	# Typesense Built-in Embedding Models

	This repository holds all the built-in ML models supported by [Typesense](https://typesense.org) for semantic search currently.

	If you have a model that you would like to add to our supported list, you can convert it to the ONNX format and create a Pull Request (PR) to include it. (See below for instructions).

	## Usage

	Here's an example of how to specify the model to use for auto-embedding generation when creating a collection in Typesense:

	```bash
	curl -X POST \
	'http://localhost:8108/collections' \
	-H 'Content-Type: application/json' \
	-H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
	-d '{
	"name": "products",
	"fields": [
	{
	"name": "product_name",
	"type": "string"
	},
	{
	"name": "embedding",
	"type": "float[]",
	"embed": {
	"from": [
	"product_name"
	],
	"model_config": {
	"model_name": "ts/all-MiniLM-L12-v2"
	}
	}
	}
	]
	}'
	```

	Replace `all-MiniLM-L12-v2` with any model name from this repository.

	Here's a detailed step-by-step article with more information: https://typesense.org/docs/guide/semantic-search.html

	## Contributing

	If you have a model that you would like to add to our supported list, you can convert it to the ONNX format and create a Pull Request (PR) to include it. (See below for instructions).

	### Convert a model to ONNX format

	#### Converting a Hugging Face Transformers Model
	To convert any model from Hugging Face to ONNX format, you can follow the instructions in [this link](https://huggingface.co/docs/transformers/serialization#export-to-onnx) using the ```optimum-cli```.

	#### Converting a PyTorch Model
	If you have a PyTorch model, you can use the ```torch.onnx``` APIs to convert it to the ONNX format. More information on the conversion process can be found [here](https://pytorch.org/docs/stable/onnx.html).

	#### Converting a Tensorflow Model
	For Tensorflow models, you can utilize the tf2onnx tool to convert them to the ONNX format. Detailed guidance on this conversion can be found [here](https://onnxruntime.ai/docs/tutorials/tf-get-started.html#getting-started-converting-tensorflow-to-onnx).

	#### Creating model config

	Before submitting your ONNX model through a PR, you need to organize the necessary files under a folder with the model's name. Ensure that your model configuration adheres to the following structure:

	- Model File: The ONNX model file.
	- Vocab File: The vocabulary file required for the model.
	- Model Config File: Named as config.json, this file should contain the following keys:
	\| Key \| Description \| Optional \|
	\|-----\|-------------\|----------\|
	\|model_md5\| MD5 checksum of model file as string\| No \|
	\|vocab_md5\| MD5 checksum of vocab file as string\| No \|
	\|model_type\| Model type (currently only ```bert``` and ```xlm_roberta``` supported)\| No \|
	\|vocab_file_name\| File name of vocab file\| No \|
	\|indexing_prefix\| Prefix to be added before embedding documents\| Yes \|
	\|query_prefix\| Prefix to be added before embedding queries \| Yes \|


	Please make sure that the information in the configuration file is accurate and complete before submitting your PR.

	We appreciate your contributions to expand our collection of supported embedding models!