Spaces:
No application file
No application file
File size: 5,101 Bytes
a85c9b8 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 |
---
title: 🧩 Embedding models
---
## Overview
Embedchain supports several embedding models from the following providers:
<CardGroup cols={4}>
<Card title="OpenAI" href="#openai"></Card>
<Card title="GoogleAI" href="#google-ai"></Card>
<Card title="Azure OpenAI" href="#azure-openai"></Card>
<Card title="GPT4All" href="#gpt4all"></Card>
<Card title="Hugging Face" href="#hugging-face"></Card>
<Card title="Vertex AI" href="#vertex-ai"></Card>
</CardGroup>
## OpenAI
To use OpenAI embedding function, you have to set the `OPENAI_API_KEY` environment variable. You can obtain the OpenAI API key from the [OpenAI Platform](https://platform.openai.com/account/api-keys).
Once you have obtained the key, you can use it like this:
<CodeGroup>
```python main.py
import os
from embedchain import App
os.environ['OPENAI_API_KEY'] = 'xxx'
# load embedding model configuration from config.yaml file
app = App.from_config(config_path="config.yaml")
app.add("https://en.wikipedia.org/wiki/OpenAI")
app.query("What is OpenAI?")
```
```yaml config.yaml
embedder:
provider: openai
config:
model: 'text-embedding-3-small'
```
</CodeGroup>
* OpenAI announced two new embedding models: `text-embedding-3-small` and `text-embedding-3-large`. Embedchain supports both these models. Below you can find YAML config for both:
<CodeGroup>
```yaml text-embedding-3-small.yaml
embedder:
provider: openai
config:
model: 'text-embedding-3-small'
```
```yaml text-embedding-3-large.yaml
embedder:
provider: openai
config:
model: 'text-embedding-3-large'
```
</CodeGroup>
## Google AI
To use Google AI embedding function, you have to set the `GOOGLE_API_KEY` environment variable. You can obtain the Google API key from the [Google Maker Suite](https://makersuite.google.com/app/apikey)
<CodeGroup>
```python main.py
import os
from embedchain import App
os.environ["GOOGLE_API_KEY"] = "xxx"
app = App.from_config(config_path="config.yaml")
```
```yaml config.yaml
embedder:
provider: google
config:
model: 'models/embedding-001'
task_type: "retrieval_document"
title: "Embeddings for Embedchain"
```
</CodeGroup>
<br/>
<Note>
For more details regarding the Google AI embedding model, please refer to the [Google AI documentation](https://ai.google.dev/tutorials/python_quickstart#use_embeddings).
</Note>
## Azure OpenAI
To use Azure OpenAI embedding model, you have to set some of the azure openai related environment variables as given in the code block below:
<CodeGroup>
```python main.py
import os
from embedchain import App
os.environ["OPENAI_API_TYPE"] = "azure"
os.environ["AZURE_OPENAI_ENDPOINT"] = "https://xxx.openai.azure.com/"
os.environ["AZURE_OPENAI_API_KEY"] = "xxx"
os.environ["OPENAI_API_VERSION"] = "xxx"
app = App.from_config(config_path="config.yaml")
```
```yaml config.yaml
llm:
provider: azure_openai
config:
model: gpt-35-turbo
deployment_name: your_llm_deployment_name
temperature: 0.5
max_tokens: 1000
top_p: 1
stream: false
embedder:
provider: azure_openai
config:
model: text-embedding-ada-002
deployment_name: you_embedding_model_deployment_name
```
</CodeGroup>
You can find the list of models and deployment name on the [Azure OpenAI Platform](https://oai.azure.com/portal).
## GPT4ALL
GPT4All supports generating high quality embeddings of arbitrary length documents of text using a CPU optimized contrastively trained Sentence Transformer.
<CodeGroup>
```python main.py
from embedchain import App
# load embedding model configuration from config.yaml file
app = App.from_config(config_path="config.yaml")
```
```yaml config.yaml
llm:
provider: gpt4all
config:
model: 'orca-mini-3b-gguf2-q4_0.gguf'
temperature: 0.5
max_tokens: 1000
top_p: 1
stream: false
embedder:
provider: gpt4all
```
</CodeGroup>
## Hugging Face
Hugging Face supports generating embeddings of arbitrary length documents of text using Sentence Transformer library. Example of how to generate embeddings using hugging face is given below:
<CodeGroup>
```python main.py
from embedchain import App
# load embedding model configuration from config.yaml file
app = App.from_config(config_path="config.yaml")
```
```yaml config.yaml
llm:
provider: huggingface
config:
model: 'google/flan-t5-xxl'
temperature: 0.5
max_tokens: 1000
top_p: 0.5
stream: false
embedder:
provider: huggingface
config:
model: 'sentence-transformers/all-mpnet-base-v2'
```
</CodeGroup>
## Vertex AI
Embedchain supports Google's VertexAI embeddings model through a simple interface. You just have to pass the `model_name` in the config yaml and it would work out of the box.
<CodeGroup>
```python main.py
from embedchain import App
# load embedding model configuration from config.yaml file
app = App.from_config(config_path="config.yaml")
```
```yaml config.yaml
llm:
provider: vertexai
config:
model: 'chat-bison'
temperature: 0.5
top_p: 0.5
embedder:
provider: vertexai
config:
model: 'textembedding-gecko'
```
</CodeGroup>
|