tyoyo commited on
Commit
05aa8dc
·
verified ·
1 Parent(s): bad0264

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -3
README.md CHANGED
@@ -27,9 +27,9 @@ The following table shows the performance degradation due to quantization:
27
 
28
  | Model | ELYZA-tasks-100 GPT4 score |
29
  | :-------------------------------- | ---: |
30
- | Llama-3-ELYZA-JP-8B | 3.655 |
31
- | Llama-3-ELYZA-JP-8B-GGUF (Q4_K_M) | 3.57 |
32
- | Llama-3-ELYZA-JP-8B-AWQ | 3.39 |
33
 
34
 
35
  ## Use with llama.cpp
@@ -90,6 +90,10 @@ There are various desktop applications that can handle GGUF models, but here we
90
  - **Setting Options**: You can set options from the sidebar on the right. Faster inference can be achieved by setting Quick GPU Offload to Max in the GPU Settings.
91
  - **(For Developers) Starting an API Server**: Click `<->` in the left sidebar and move to the Local Server tab. Select the model and click Start Server to launch an OpenAI API-compatible API server.
92
 
 
 
 
 
93
  ## Developers
94
 
95
  Listed in alphabetical order.
 
27
 
28
  | Model | ELYZA-tasks-100 GPT4 score |
29
  | :-------------------------------- | ---: |
30
+ | [Llama-3-ELYZA-JP-8B](https://huggingface.co/elyza/Llama-3-ELYZA-JP-8B) | 3.655 |
31
+ | [Llama-3-ELYZA-JP-8B-GGUF (Q4_K_M)](https://huggingface.co/elyza/Llama-3-ELYZA-JP-8B-GGUF) | 3.57 |
32
+ | [Llama-3-ELYZA-JP-8B-AWQ](https://huggingface.co/elyza/Llama-3-ELYZA-JP-8B-AWQ) | 3.39 |
33
 
34
 
35
  ## Use with llama.cpp
 
90
  - **Setting Options**: You can set options from the sidebar on the right. Faster inference can be achieved by setting Quick GPU Offload to Max in the GPU Settings.
91
  - **(For Developers) Starting an API Server**: Click `<->` in the left sidebar and move to the Local Server tab. Select the model and click Start Server to launch an OpenAI API-compatible API server.
92
 
93
+ ![lmstudio-demo](./lmstudio-demo.gif)
94
+
95
+ This demo showcases Llama-3-ELYZA-JP-8B-GGUF running smoothly on a MacBook Pro (M1 Pro), achieving an inference speed of approximately 20 tokens per second.
96
+
97
  ## Developers
98
 
99
  Listed in alphabetical order.