writinwaters commited on
Commit
286159b
·
1 Parent(s): 8ee6bdd

Minor editorial updates (#2207)

Browse files

### What problem does this PR solve?



### Type of change

- [x] Documentation Update

Files changed (1) hide show
  1. docs/guides/deploy_local_llm.mdx +34 -35
docs/guides/deploy_local_llm.mdx CHANGED
@@ -7,7 +7,7 @@ slug: /deploy_local_llm
7
  import Tabs from '@theme/Tabs';
8
  import TabItem from '@theme/TabItem';
9
 
10
- RAGFlow supports deploying models locally using Ollama or Xinference. If you have locally deployed models to leverage or wish to enable GPU or CUDA for inference acceleration, you can bind Ollama or Xinference into RAGFlow and use either of them as a local "server" for interacting with your local models.
11
 
12
  RAGFlow seamlessly integrates with Ollama and Xinference, without the need for further environment configurations. You can use them to deploy two types of local models in RAGFlow: chat models and embedding models.
13
 
@@ -15,40 +15,6 @@ RAGFlow seamlessly integrates with Ollama and Xinference, without the need for f
15
  This user guide does not intend to cover much of the installation or configuration details of Ollama or Xinference; its focus is on configurations inside RAGFlow. For the most current information, you may need to check out the official site of Ollama or Xinference.
16
  :::
17
 
18
- # Deploy a local model using jina
19
-
20
- [Jina](https://github.com/jina-ai/jina) lets you build AI services and pipelines that communicate via gRPC, HTTP and WebSockets, then scale them up and deploy to production.
21
-
22
- To deploy a local model, e.g., **gpt2**, using Jina:
23
-
24
- ### 1. Check firewall settings
25
-
26
- Ensure that your host machine's firewall allows inbound connections on port 12345.
27
-
28
- ```bash
29
- sudo ufw allow 12345/tcp
30
- ```
31
-
32
- ### 2.install jina package
33
-
34
- ```bash
35
- pip install jina
36
- ```
37
-
38
- ### 3. deployment local model
39
-
40
- Step 1: Navigate to the rag/svr directory.
41
-
42
- ```bash
43
- cd rag/svr
44
- ```
45
-
46
- Step 2: Use Python to run the jina_server.py script and pass in the model name or the local path of the model (the script only supports loading models downloaded from Huggingface)
47
-
48
- ```bash
49
- python jina_server.py --model_name gpt2
50
- ```
51
-
52
  ## Deploy a local model using Ollama
53
 
54
  [Ollama](https://github.com/ollama/ollama) enables you to run open-source large language models that you deployed locally. It bundles model weights, configurations, and data into a single package, defined by a Modelfile, and optimizes setup and configurations, including GPU usage.
@@ -347,3 +313,36 @@ To enable IPEX-LLM accelerated Ollama in RAGFlow, you must also complete the con
347
  2. [Complete basic Ollama settings](#5-complete-basic-ollama-settings)
348
  3. [Update System Model Settings](#6-update-system-model-settings)
349
  4. [Update Chat Configuration](#7-update-chat-configuration)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  import Tabs from '@theme/Tabs';
8
  import TabItem from '@theme/TabItem';
9
 
10
+ RAGFlow supports deploying models locally using Ollama, Xinference, IPEX-LLM, or jina. If you have locally deployed models to leverage or wish to enable GPU or CUDA for inference acceleration, you can bind Ollama or Xinference into RAGFlow and use either of them as a local "server" for interacting with your local models.
11
 
12
  RAGFlow seamlessly integrates with Ollama and Xinference, without the need for further environment configurations. You can use them to deploy two types of local models in RAGFlow: chat models and embedding models.
13
 
 
15
  This user guide does not intend to cover much of the installation or configuration details of Ollama or Xinference; its focus is on configurations inside RAGFlow. For the most current information, you may need to check out the official site of Ollama or Xinference.
16
  :::
17
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
  ## Deploy a local model using Ollama
19
 
20
  [Ollama](https://github.com/ollama/ollama) enables you to run open-source large language models that you deployed locally. It bundles model weights, configurations, and data into a single package, defined by a Modelfile, and optimizes setup and configurations, including GPU usage.
 
313
  2. [Complete basic Ollama settings](#5-complete-basic-ollama-settings)
314
  3. [Update System Model Settings](#6-update-system-model-settings)
315
  4. [Update Chat Configuration](#7-update-chat-configuration)
316
+
317
+ ## Deploy a local model using jina
318
+
319
+ To deploy a local model, e.g., **gpt2**, using jina:
320
+
321
+ ### 1. Check firewall settings
322
+
323
+ Ensure that your host machine's firewall allows inbound connections on port 12345.
324
+
325
+ ```bash
326
+ sudo ufw allow 12345/tcp
327
+ ```
328
+
329
+ ### 2. Install jina package
330
+
331
+ ```bash
332
+ pip install jina
333
+ ```
334
+
335
+ ### 3. Deploy a local model
336
+
337
+ Step 1: Navigate to the **rag/svr** directory.
338
+
339
+ ```bash
340
+ cd rag/svr
341
+ ```
342
+
343
+ Step 2: Run **jina_server.py**, specifying either the model's name or its local directory:
344
+
345
+ ```bash
346
+ python jina_server.py --model_name gpt2
347
+ ```
348
+ > The script only supports models downloaded from Hugging Face.