Spaces:

rajesh1501
/

embedchain

No application file

App Files Files Community

embedchain / docs /components /embedding-models.mdx

rajesh1501

Upload folder using huggingface_hub

a85c9b8 verified over 1 year ago

raw

history blame contribute delete

5.1 kB

	---
	title: 🧩 Embedding models
	---

	## Overview

	Embedchain supports several embedding models from the following providers:

	<CardGroup cols={4}>
	<Card title="OpenAI" href="#openai"></Card>
	<Card title="GoogleAI" href="#google-ai"></Card>
	<Card title="Azure OpenAI" href="#azure-openai"></Card>
	<Card title="GPT4All" href="#gpt4all"></Card>
	<Card title="Hugging Face" href="#hugging-face"></Card>
	<Card title="Vertex AI" href="#vertex-ai"></Card>
	</CardGroup>

	## OpenAI

	To use OpenAI embedding function, you have to set the `OPENAI_API_KEY` environment variable. You can obtain the OpenAI API key from the [OpenAI Platform](https://platform.openai.com/account/api-keys).

	Once you have obtained the key, you can use it like this:

	<CodeGroup>

	```python main.py
	import os
	from embedchain import App

	os.environ['OPENAI_API_KEY'] = 'xxx'

	# load embedding model configuration from config.yaml file
	app = App.from_config(config_path="config.yaml")

	app.add("https://en.wikipedia.org/wiki/OpenAI")
	app.query("What is OpenAI?")
	```

	```yaml config.yaml
	embedder:
	provider: openai
	config:
	model: 'text-embedding-3-small'
	```

	</CodeGroup>

	* OpenAI announced two new embedding models: `text-embedding-3-small` and `text-embedding-3-large`. Embedchain supports both these models. Below you can find YAML config for both:

	<CodeGroup>

	```yaml text-embedding-3-small.yaml
	embedder:
	provider: openai
	config:
	model: 'text-embedding-3-small'
	```

	```yaml text-embedding-3-large.yaml
	embedder:
	provider: openai
	config:
	model: 'text-embedding-3-large'
	```

	</CodeGroup>

	## Google AI

	To use Google AI embedding function, you have to set the `GOOGLE_API_KEY` environment variable. You can obtain the Google API key from the [Google Maker Suite](https://makersuite.google.com/app/apikey)

	<CodeGroup>
	```python main.py
	import os
	from embedchain import App

	os.environ["GOOGLE_API_KEY"] = "xxx"

	app = App.from_config(config_path="config.yaml")
	```

	```yaml config.yaml
	embedder:
	provider: google
	config:
	model: 'models/embedding-001'
	task_type: "retrieval_document"
	title: "Embeddings for Embedchain"
	```
	</CodeGroup>
	<br/>
	<Note>
	For more details regarding the Google AI embedding model, please refer to the [Google AI documentation](https://ai.google.dev/tutorials/python_quickstart#use_embeddings).
	</Note>

	## Azure OpenAI

	To use Azure OpenAI embedding model, you have to set some of the azure openai related environment variables as given in the code block below:

	<CodeGroup>

	```python main.py
	import os
	from embedchain import App

	os.environ["OPENAI_API_TYPE"] = "azure"
	os.environ["AZURE_OPENAI_ENDPOINT"] = "https://xxx.openai.azure.com/"
	os.environ["AZURE_OPENAI_API_KEY"] = "xxx"
	os.environ["OPENAI_API_VERSION"] = "xxx"

	app = App.from_config(config_path="config.yaml")
	```

	```yaml config.yaml
	llm:
	provider: azure_openai
	config:
	model: gpt-35-turbo
	deployment_name: your_llm_deployment_name
	temperature: 0.5
	max_tokens: 1000
	top_p: 1
	stream: false

	embedder:
	provider: azure_openai
	config:
	model: text-embedding-ada-002
	deployment_name: you_embedding_model_deployment_name
	```
	</CodeGroup>

	You can find the list of models and deployment name on the [Azure OpenAI Platform](https://oai.azure.com/portal).

	## GPT4ALL

	GPT4All supports generating high quality embeddings of arbitrary length documents of text using a CPU optimized contrastively trained Sentence Transformer.

	<CodeGroup>

	```python main.py
	from embedchain import App

	# load embedding model configuration from config.yaml file
	app = App.from_config(config_path="config.yaml")
	```

	```yaml config.yaml
	llm:
	provider: gpt4all
	config:
	model: 'orca-mini-3b-gguf2-q4_0.gguf'
	temperature: 0.5
	max_tokens: 1000
	top_p: 1
	stream: false

	embedder:
	provider: gpt4all
	```

	</CodeGroup>

	## Hugging Face

	Hugging Face supports generating embeddings of arbitrary length documents of text using Sentence Transformer library. Example of how to generate embeddings using hugging face is given below:

	<CodeGroup>

	```python main.py
	from embedchain import App

	# load embedding model configuration from config.yaml file
	app = App.from_config(config_path="config.yaml")
	```

	```yaml config.yaml
	llm:
	provider: huggingface
	config:
	model: 'google/flan-t5-xxl'
	temperature: 0.5
	max_tokens: 1000
	top_p: 0.5
	stream: false

	embedder:
	provider: huggingface
	config:
	model: 'sentence-transformers/all-mpnet-base-v2'
	```

	</CodeGroup>

	## Vertex AI

	Embedchain supports Google's VertexAI embeddings model through a simple interface. You just have to pass the `model_name` in the config yaml and it would work out of the box.

	<CodeGroup>

	```python main.py
	from embedchain import App

	# load embedding model configuration from config.yaml file
	app = App.from_config(config_path="config.yaml")
	```

	```yaml config.yaml
	llm:
	provider: vertexai
	config:
	model: 'chat-bison'
	temperature: 0.5
	top_p: 0.5

	embedder:
	provider: vertexai
	config:
	model: 'textembedding-gecko'
	```

	</CodeGroup>