writinwaters
commited on
Commit
·
286159b
1
Parent(s):
8ee6bdd
Minor editorial updates (#2207)
Browse files### What problem does this PR solve?
### Type of change
- [x] Documentation Update
- docs/guides/deploy_local_llm.mdx +34 -35
docs/guides/deploy_local_llm.mdx
CHANGED
@@ -7,7 +7,7 @@ slug: /deploy_local_llm
|
|
7 |
import Tabs from '@theme/Tabs';
|
8 |
import TabItem from '@theme/TabItem';
|
9 |
|
10 |
-
RAGFlow supports deploying models locally using Ollama or
|
11 |
|
12 |
RAGFlow seamlessly integrates with Ollama and Xinference, without the need for further environment configurations. You can use them to deploy two types of local models in RAGFlow: chat models and embedding models.
|
13 |
|
@@ -15,40 +15,6 @@ RAGFlow seamlessly integrates with Ollama and Xinference, without the need for f
|
|
15 |
This user guide does not intend to cover much of the installation or configuration details of Ollama or Xinference; its focus is on configurations inside RAGFlow. For the most current information, you may need to check out the official site of Ollama or Xinference.
|
16 |
:::
|
17 |
|
18 |
-
# Deploy a local model using jina
|
19 |
-
|
20 |
-
[Jina](https://github.com/jina-ai/jina) lets you build AI services and pipelines that communicate via gRPC, HTTP and WebSockets, then scale them up and deploy to production.
|
21 |
-
|
22 |
-
To deploy a local model, e.g., **gpt2**, using Jina:
|
23 |
-
|
24 |
-
### 1. Check firewall settings
|
25 |
-
|
26 |
-
Ensure that your host machine's firewall allows inbound connections on port 12345.
|
27 |
-
|
28 |
-
```bash
|
29 |
-
sudo ufw allow 12345/tcp
|
30 |
-
```
|
31 |
-
|
32 |
-
### 2.install jina package
|
33 |
-
|
34 |
-
```bash
|
35 |
-
pip install jina
|
36 |
-
```
|
37 |
-
|
38 |
-
### 3. deployment local model
|
39 |
-
|
40 |
-
Step 1: Navigate to the rag/svr directory.
|
41 |
-
|
42 |
-
```bash
|
43 |
-
cd rag/svr
|
44 |
-
```
|
45 |
-
|
46 |
-
Step 2: Use Python to run the jina_server.py script and pass in the model name or the local path of the model (the script only supports loading models downloaded from Huggingface)
|
47 |
-
|
48 |
-
```bash
|
49 |
-
python jina_server.py --model_name gpt2
|
50 |
-
```
|
51 |
-
|
52 |
## Deploy a local model using Ollama
|
53 |
|
54 |
[Ollama](https://github.com/ollama/ollama) enables you to run open-source large language models that you deployed locally. It bundles model weights, configurations, and data into a single package, defined by a Modelfile, and optimizes setup and configurations, including GPU usage.
|
@@ -347,3 +313,36 @@ To enable IPEX-LLM accelerated Ollama in RAGFlow, you must also complete the con
|
|
347 |
2. [Complete basic Ollama settings](#5-complete-basic-ollama-settings)
|
348 |
3. [Update System Model Settings](#6-update-system-model-settings)
|
349 |
4. [Update Chat Configuration](#7-update-chat-configuration)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
7 |
import Tabs from '@theme/Tabs';
|
8 |
import TabItem from '@theme/TabItem';
|
9 |
|
10 |
+
RAGFlow supports deploying models locally using Ollama, Xinference, IPEX-LLM, or jina. If you have locally deployed models to leverage or wish to enable GPU or CUDA for inference acceleration, you can bind Ollama or Xinference into RAGFlow and use either of them as a local "server" for interacting with your local models.
|
11 |
|
12 |
RAGFlow seamlessly integrates with Ollama and Xinference, without the need for further environment configurations. You can use them to deploy two types of local models in RAGFlow: chat models and embedding models.
|
13 |
|
|
|
15 |
This user guide does not intend to cover much of the installation or configuration details of Ollama or Xinference; its focus is on configurations inside RAGFlow. For the most current information, you may need to check out the official site of Ollama or Xinference.
|
16 |
:::
|
17 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
18 |
## Deploy a local model using Ollama
|
19 |
|
20 |
[Ollama](https://github.com/ollama/ollama) enables you to run open-source large language models that you deployed locally. It bundles model weights, configurations, and data into a single package, defined by a Modelfile, and optimizes setup and configurations, including GPU usage.
|
|
|
313 |
2. [Complete basic Ollama settings](#5-complete-basic-ollama-settings)
|
314 |
3. [Update System Model Settings](#6-update-system-model-settings)
|
315 |
4. [Update Chat Configuration](#7-update-chat-configuration)
|
316 |
+
|
317 |
+
## Deploy a local model using jina
|
318 |
+
|
319 |
+
To deploy a local model, e.g., **gpt2**, using jina:
|
320 |
+
|
321 |
+
### 1. Check firewall settings
|
322 |
+
|
323 |
+
Ensure that your host machine's firewall allows inbound connections on port 12345.
|
324 |
+
|
325 |
+
```bash
|
326 |
+
sudo ufw allow 12345/tcp
|
327 |
+
```
|
328 |
+
|
329 |
+
### 2. Install jina package
|
330 |
+
|
331 |
+
```bash
|
332 |
+
pip install jina
|
333 |
+
```
|
334 |
+
|
335 |
+
### 3. Deploy a local model
|
336 |
+
|
337 |
+
Step 1: Navigate to the **rag/svr** directory.
|
338 |
+
|
339 |
+
```bash
|
340 |
+
cd rag/svr
|
341 |
+
```
|
342 |
+
|
343 |
+
Step 2: Run **jina_server.py**, specifying either the model's name or its local directory:
|
344 |
+
|
345 |
+
```bash
|
346 |
+
python jina_server.py --model_name gpt2
|
347 |
+
```
|
348 |
+
> The script only supports models downloaded from Hugging Face.
|