File size: 2,217 Bytes
21e64dc
 
 
00e5134
048f2d2
 
f2ed886
 
048f2d2
 
 
00e5134
 
f2ed886
00e5134
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
99cdd57
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
---
license: apache-2.0
---

<center>
<h3>Welcome to llawa</h3>
<img src="https://huggingface.co/juntaoyuan/llawa/resolve/main/llawa-logo.png"/>
<br/><i>a.k.a Llama2 + Wasm QA</i>
</center>


The models in this repo are Llama2 7b chat models further fine-tuned with Wasm-related Q&As. 
Instead of struggling with Python and PyTorch, the simplest way to run them on your own laptops, servers, or edge devices is to use the [WasmEdge Runtime](https://github.com/WasmEdge/WasmEdge).
Learn more about this [fast, lightweight, portable, and ZERO Python dependency approach](https://medium.com/stackademic/fast-and-portable-llama2-inference-on-the-heterogeneous-edge-a62508e82359) for running AI applications!

1. Install WasmEdge

```
curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install.sh | bash -s -- --plugins wasi_nn-ggml
```

2. Download the chat app. It is a portal Wasm bytecode app. It runs across CPUs, GPUs, and OSes.

```
curl -LO https://github.com/second-state/llama-utils/raw/main/chat/llama-chat.wasm
```

3. Chat with the model on CLI

```
wasmedge --dir .:. \
  --nn-preload default:GGML:AUTO:llama-2-7b-chat-wasm-q5_k_m.gguf \
  llama-chat.wasm --reverse-prompt "</s>"
```

Below is an example chat session:

```
[USER]:
What is Wasm?
[ASSISTANT]:
Wasm is a binary instruction format for a stack-based virtual machine. It's a portable target for the WebAssembly runtime.
[USER]:
What is WASI-NN?
[ASSISTANT]:
WASI-NN is a Wasm system interface for neural networks. It allows Wasm apps to use and integrate with different neural network frameworks.
[USER]:
Which runtimes support it?
[ASSISTANT]:
Wasm is supported by a number of runtimes including WasmEdge, Wasmtime, and Wasmer.
[USER]:
How do I run AI workloads in serverless functions?
[ASSISTANT]:
You can use Wasm to run AI workloads in serverless functions. WasmEdge supports running AI workloads using the WASI-NN interface.
```

> The `llama-2-7b-chat-wasm-overfit-q5_k_m.gguf` file is the fine-tuned model at epoch 25. It has a training loss of 0.03, and is probably over-fitted. You can try the above questions and see it give poor answers. We believe that training loss at 0.05 to 0.1 is optimal for this model.