File size: 2,507 Bytes
569f484
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
## Xinference Infer
Xinference is a unified inference platform that provides a unified interface for different inference engines. It supports LLM, text generation, image generation, and more.but it's not bigger than Swift too much.


### Xinference install
Xinference can be installed simply by using the following easy bash code:
```shell
pip install "xinference[all]"
```

### Quick start
The initial steps for conducting inference with Xinference involve downloading the model during the first launch.
1. Start Xinference in the terminal:
```shell
xinference
```
2. Start the web ui.
3. Search for "MiniCPM-Llama3-V-2_5" in the search box.

![alt text](../assets/xinferenc_demo_image/xinference_search_box.png)

4. Find and click the MiniCPM-Llama3-V-2_5 button.
5. Follow the config and launch the model.
```plaintext
Model engine : Transformers
model format : pytorch
Model size   : 8
quantization : none
N-GPU        : auto
Replica      : 1
```
6. After first click the launch button,xinference will download the model from huggingface. We should click the webui button.

![alt text](../assets/xinferenc_demo_image/xinference_webui_button.png)

7. Upload the image and chatting with the MiniCPM-Llama3-V-2_5

### Local MiniCPM-Llama3-V-2_5 Launch
If you have already downloaded the MiniCPM-Llama3-V-2_5 model locally, you can proceed with Xinference inference following these steps:
1. Start Xinference
```shell
xinference
```
2. Start the web ui.
3. To register a new model, follow these steps: the settings highlighted in red are fixed and cannot be changed, whereas others are customizable according to your needs. Complete the process by clicking the 'Register Model' button.

![alt text](../assets/xinferenc_demo_image/xinference_register_model1.png)
![alt text](../assets/xinferenc_demo_image/xinference_register_model2.png)

4. After completing the model registration, proceed to 'Custom Models' and locate the model you just registered.
5. Follow the config and launch the model.
```plaintext
Model engine : Transformers
model format : pytorch
Model size   : 8
quantization : none
N-GPU        : auto
Replica      : 1
```
6. After first click the launch button,Xinference will download the model from Huggingface. we should click the chat button.
![alt text](../assets/xinferenc_demo_image/xinference_webui_button.png)
7. Upload the image and chatting with the MiniCPM-Llama3-V-2_5

### FAQ
1. Why can't the sixth step open the WebUI?

Maybe your firewall or mac os to prevent the web to open.