Spaces:
No application file
No application file
--- | |
title: 🧩 Embedding models | |
--- | |
## Overview | |
Embedchain supports several embedding models from the following providers: | |
<CardGroup cols={4}> | |
<Card title="OpenAI" href="#openai"></Card> | |
<Card title="GoogleAI" href="#google-ai"></Card> | |
<Card title="Azure OpenAI" href="#azure-openai"></Card> | |
<Card title="GPT4All" href="#gpt4all"></Card> | |
<Card title="Hugging Face" href="#hugging-face"></Card> | |
<Card title="Vertex AI" href="#vertex-ai"></Card> | |
</CardGroup> | |
## OpenAI | |
To use OpenAI embedding function, you have to set the `OPENAI_API_KEY` environment variable. You can obtain the OpenAI API key from the [OpenAI Platform](https://platform.openai.com/account/api-keys). | |
Once you have obtained the key, you can use it like this: | |
<CodeGroup> | |
```python main.py | |
import os | |
from embedchain import App | |
os.environ['OPENAI_API_KEY'] = 'xxx' | |
# load embedding model configuration from config.yaml file | |
app = App.from_config(config_path="config.yaml") | |
app.add("https://en.wikipedia.org/wiki/OpenAI") | |
app.query("What is OpenAI?") | |
``` | |
```yaml config.yaml | |
embedder: | |
provider: openai | |
config: | |
model: 'text-embedding-3-small' | |
``` | |
</CodeGroup> | |
* OpenAI announced two new embedding models: `text-embedding-3-small` and `text-embedding-3-large`. Embedchain supports both these models. Below you can find YAML config for both: | |
<CodeGroup> | |
```yaml text-embedding-3-small.yaml | |
embedder: | |
provider: openai | |
config: | |
model: 'text-embedding-3-small' | |
``` | |
```yaml text-embedding-3-large.yaml | |
embedder: | |
provider: openai | |
config: | |
model: 'text-embedding-3-large' | |
``` | |
</CodeGroup> | |
## Google AI | |
To use Google AI embedding function, you have to set the `GOOGLE_API_KEY` environment variable. You can obtain the Google API key from the [Google Maker Suite](https://makersuite.google.com/app/apikey) | |
<CodeGroup> | |
```python main.py | |
import os | |
from embedchain import App | |
os.environ["GOOGLE_API_KEY"] = "xxx" | |
app = App.from_config(config_path="config.yaml") | |
``` | |
```yaml config.yaml | |
embedder: | |
provider: google | |
config: | |
model: 'models/embedding-001' | |
task_type: "retrieval_document" | |
title: "Embeddings for Embedchain" | |
``` | |
</CodeGroup> | |
<br/> | |
<Note> | |
For more details regarding the Google AI embedding model, please refer to the [Google AI documentation](https://ai.google.dev/tutorials/python_quickstart#use_embeddings). | |
</Note> | |
## Azure OpenAI | |
To use Azure OpenAI embedding model, you have to set some of the azure openai related environment variables as given in the code block below: | |
<CodeGroup> | |
```python main.py | |
import os | |
from embedchain import App | |
os.environ["OPENAI_API_TYPE"] = "azure" | |
os.environ["AZURE_OPENAI_ENDPOINT"] = "https://xxx.openai.azure.com/" | |
os.environ["AZURE_OPENAI_API_KEY"] = "xxx" | |
os.environ["OPENAI_API_VERSION"] = "xxx" | |
app = App.from_config(config_path="config.yaml") | |
``` | |
```yaml config.yaml | |
llm: | |
provider: azure_openai | |
config: | |
model: gpt-35-turbo | |
deployment_name: your_llm_deployment_name | |
temperature: 0.5 | |
max_tokens: 1000 | |
top_p: 1 | |
stream: false | |
embedder: | |
provider: azure_openai | |
config: | |
model: text-embedding-ada-002 | |
deployment_name: you_embedding_model_deployment_name | |
``` | |
</CodeGroup> | |
You can find the list of models and deployment name on the [Azure OpenAI Platform](https://oai.azure.com/portal). | |
## GPT4ALL | |
GPT4All supports generating high quality embeddings of arbitrary length documents of text using a CPU optimized contrastively trained Sentence Transformer. | |
<CodeGroup> | |
```python main.py | |
from embedchain import App | |
# load embedding model configuration from config.yaml file | |
app = App.from_config(config_path="config.yaml") | |
``` | |
```yaml config.yaml | |
llm: | |
provider: gpt4all | |
config: | |
model: 'orca-mini-3b-gguf2-q4_0.gguf' | |
temperature: 0.5 | |
max_tokens: 1000 | |
top_p: 1 | |
stream: false | |
embedder: | |
provider: gpt4all | |
``` | |
</CodeGroup> | |
## Hugging Face | |
Hugging Face supports generating embeddings of arbitrary length documents of text using Sentence Transformer library. Example of how to generate embeddings using hugging face is given below: | |
<CodeGroup> | |
```python main.py | |
from embedchain import App | |
# load embedding model configuration from config.yaml file | |
app = App.from_config(config_path="config.yaml") | |
``` | |
```yaml config.yaml | |
llm: | |
provider: huggingface | |
config: | |
model: 'google/flan-t5-xxl' | |
temperature: 0.5 | |
max_tokens: 1000 | |
top_p: 0.5 | |
stream: false | |
embedder: | |
provider: huggingface | |
config: | |
model: 'sentence-transformers/all-mpnet-base-v2' | |
``` | |
</CodeGroup> | |
## Vertex AI | |
Embedchain supports Google's VertexAI embeddings model through a simple interface. You just have to pass the `model_name` in the config yaml and it would work out of the box. | |
<CodeGroup> | |
```python main.py | |
from embedchain import App | |
# load embedding model configuration from config.yaml file | |
app = App.from_config(config_path="config.yaml") | |
``` | |
```yaml config.yaml | |
llm: | |
provider: vertexai | |
config: | |
model: 'chat-bison' | |
temperature: 0.5 | |
top_p: 0.5 | |
embedder: | |
provider: vertexai | |
config: | |
model: 'textembedding-gecko' | |
``` | |
</CodeGroup> | |