Commit
•
00e5134
1
Parent(s):
ba9719a
Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,49 @@
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
---
|
4 |
+
|
5 |
+
The models in this repo are Llama2 7b chat models further fine-tuned with Wasm-related Q&As.
|
6 |
+
Instead of struggling with Python and PyTorch, the simplest way to run them on your own laptops, servers, or edge devices is to use the [WasmEdge Runtime](https://github.com/WasmEdge/WasmEdge).
|
7 |
+
[Learn more](https://medium.com/stackademic/fast-and-portable-llama2-inference-on-the-heterogeneous-edge-a62508e82359) about this fast, lightweight, portable, and ZERO Python dependency approach for running AI applications!
|
8 |
+
|
9 |
+
1. Install WasmEdge
|
10 |
+
|
11 |
+
```
|
12 |
+
curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install.sh | bash -s -- --plugins wasi_nn-ggml
|
13 |
+
```
|
14 |
+
|
15 |
+
2. Download the chat app. It is a portal Wasm bytecode app. It runs across CPUs, GPUs, and OSes.
|
16 |
+
|
17 |
+
```
|
18 |
+
cd ~
|
19 |
+
curl -LO https://github.com/second-state/llama-utils/raw/main/chat/llama-chat.wasm
|
20 |
+
```
|
21 |
+
|
22 |
+
3. Chat with the model on CLI
|
23 |
+
|
24 |
+
```
|
25 |
+
wasmedge --dir .:. \
|
26 |
+
--nn-preload default:GGML:AUTO:llama-2-7b-chat-wasm-q5_k_m.gguf \
|
27 |
+
llama-chat.wasm --reverse-prompt "</s>"
|
28 |
+
```
|
29 |
+
|
30 |
+
Below is an example chat session:
|
31 |
+
|
32 |
+
```
|
33 |
+
[USER]:
|
34 |
+
What is Wasm?
|
35 |
+
[ASSISTANT]:
|
36 |
+
Wasm is a binary instruction format for a stack-based virtual machine. It's a portable target for the WebAssembly runtime.
|
37 |
+
[USER]:
|
38 |
+
What is WASI-NN?
|
39 |
+
[ASSISTANT]:
|
40 |
+
WASI-NN is a Wasm system interface for neural networks. It allows Wasm apps to use and integrate with different neural network frameworks.
|
41 |
+
[USER]:
|
42 |
+
Which runtimes support it?
|
43 |
+
[ASSISTANT]:
|
44 |
+
Wasm is supported by a number of runtimes including WasmEdge, Wasmtime, and Wasmer.
|
45 |
+
[USER]:
|
46 |
+
How do I run AI workloads in serverless functions?
|
47 |
+
[ASSISTANT]:
|
48 |
+
You can use Wasm to run AI workloads in serverless functions. WasmEdge supports running AI workloads using the WASI-NN interface.
|
49 |
+
```
|