bingal commited on
Commit
2d6c494
1 Parent(s): 26b9741

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +50 -0
README.md ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ ---
4
+
5
+
6
+ ## About llamafile
7
+ <a href="https://github.com/Mozilla-Ocho/llamafile" target="_blank">github</a><br/>
8
+ <a href="https://www.bingal.com/posts/ai-llamafile-usage/" target="_blank">llamafile 中文说明</a><br/>
9
+ <a href="https://www.modelscope.cn/models/bingal/llamafile-models/summary" target="_blank">The llamafile model collection in modelscope.cn</a><br/>
10
+ <a href="https://www.modelscope.cn/models/bingal/Qwen1.5-14B-Chat-llamafile/summary" target="_blank">Qwen1.5-14B-Chat-llamafile in modelscope.cn</a>
11
+
12
+ ## Useage
13
+
14
+ 1. Download model: `qwen1.5-14b-chat-q5_k_m.llamafile`
15
+ 2. Run the model
16
+ * Windows
17
+ 1. Rename the file to `qwen1.5-14b-chat-q5_k_m.exe`
18
+ 2. Open terminal window, and run: `\qwen1.5-14b-chat-q5_k_m.exe`
19
+ 3. Open browser to http://127.0.0.1:8080 to start chatting
20
+ * Linux / macOS
21
+ 1. Add execution permissions: `chmod +x ./qwen1.5-14b-chat-q5_k_m.llamafile`
22
+ 2. Run in terminal: `./qwen1.5-14b-chat-q5_k_m.llamafile`
23
+ 3. Open browser to http://127.0.0.1:8080 to start chatting
24
+ 3. Openai api usage
25
+ * api url: `http://127.0.0.1:8080/v1`
26
+ * Python code:
27
+ ```python
28
+ #!/usr/bin/env python3
29
+ from openai import OpenAI
30
+ client = OpenAI(
31
+ base_url="http://127.0.0.1:8080/v1", # "http://<Your api-server IP>:port"
32
+ api_key = "sk-no-key-required"
33
+ )
34
+ completion = client.chat.completions.create(
35
+ model="LLaMA_CPP",
36
+ messages=[
37
+ {"role": "system", "content": "You are an AI assistant."},
38
+ {"role": "user", "content": "Write a story about dragon"}
39
+ ]
40
+ )
41
+ print(completion.choices[0].message)
42
+ ```
43
+
44
+ ## Parameter Description
45
+
46
+ - `-ngl 9999` indicates how many layers of the model are placed on the GPU to run, with the rest running on the CPU. If there is no GPU available, it can be set to `-ngl 0`. The default is 9999, which means everything runs on the GPU (drivers and CUDA runtime environment must be installed).
47
+ - `--host 0.0.0.0` is the hostname for the web service. If only local access is needed, it can be set to `--host 127.0.0.1`. If set to `0.0.0.0`, it can be accessed via IP within the network.
48
+ - `--port 8080` is the port for the web service, with the default being `8080`, which can be modified using this parameter.
49
+ - `-t 16` is the number of threads. When running on the CPU, you can set the number of cores to run concurrently based on the CPU core count.
50
+ - Other parameters can be viewed with `--help`.