Update README.md
Browse files
README.md
CHANGED
@@ -1,25 +1,121 @@
|
|
1 |
---
|
2 |
library_name: transformers
|
|
|
|
|
|
|
|
|
3 |
tags:
|
4 |
- llama-cpp
|
5 |
---
|
6 |
|
7 |
# Llama-3-ELYZA-JP-8B-GGUF
|
8 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
9 |
## Use with llama.cpp
|
10 |
Install llama.cpp through brew (works on Mac and Linux)
|
11 |
|
12 |
```bash
|
13 |
brew install llama.cpp
|
14 |
```
|
15 |
-
Invoke the llama.cpp server
|
16 |
|
17 |
-
### CLI:
|
18 |
```bash
|
19 |
-
llama
|
|
|
|
|
|
|
20 |
```
|
21 |
|
22 |
-
|
|
|
23 |
```bash
|
24 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
25 |
```
|
|
|
1 |
---
|
2 |
library_name: transformers
|
3 |
+
license: llama3
|
4 |
+
language:
|
5 |
+
- ja
|
6 |
+
- en
|
7 |
tags:
|
8 |
- llama-cpp
|
9 |
---
|
10 |
|
11 |
# Llama-3-ELYZA-JP-8B-GGUF
|
12 |
|
13 |
+
![Llama-3-ELYZA-JP-8B-image](./key_visual.png)
|
14 |
+
|
15 |
+
## Model Description
|
16 |
+
|
17 |
+
**Llama-3-ELYZA-JP-8B** is a large language model trained by [ELYZA, Inc](https://elyza.ai/).
|
18 |
+
Based on [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct), it has been enhanced for Japanese usage through additional pre-training and instruction tuning.
|
19 |
+
|
20 |
+
For more details, please refer to [our blog post](https://note.com/elyza/n/n360b6084fdbd).
|
21 |
+
|
22 |
## Use with llama.cpp
|
23 |
Install llama.cpp through brew (works on Mac and Linux)
|
24 |
|
25 |
```bash
|
26 |
brew install llama.cpp
|
27 |
```
|
28 |
+
Invoke the llama.cpp server.
|
29 |
|
|
|
30 |
```bash
|
31 |
+
$ llama-server \
|
32 |
+
--hf-repo elyza/Llama-3-ELYZA-JP-8B-GGUF \
|
33 |
+
--hf-file Llama-3-ELYZA-JP-8B-q4_k_m.gguf \
|
34 |
+
--port 8080
|
35 |
```
|
36 |
|
37 |
+
Call the API using curl.
|
38 |
+
|
39 |
```bash
|
40 |
+
$ curl http://localhost:8080/v1/chat/completions \
|
41 |
+
-H "Content-Type: application/json" \
|
42 |
+
-d '{
|
43 |
+
"messages": [
|
44 |
+
{ "role": "system", "content": "あなたは誠実で優秀な日本人のアシスタントです。特に指示が無い場合は、常に日本語で回答してください。" },
|
45 |
+
{ "role": "user", "content": "古代ギリシャを学ぶ上で知っておくべきポイントは?" }
|
46 |
+
],
|
47 |
+
"temperature": 0.6,
|
48 |
+
"max_tokens": -1,
|
49 |
+
"stream": false
|
50 |
+
}'
|
51 |
+
```
|
52 |
+
|
53 |
+
Call the API using Python.
|
54 |
+
|
55 |
+
```python
|
56 |
+
import openai
|
57 |
+
|
58 |
+
client = openai.OpenAI(
|
59 |
+
base_url="http://localhost:8080/v1",
|
60 |
+
api_key = "dummy_api_key"
|
61 |
+
)
|
62 |
+
|
63 |
+
completion = client.chat.completions.create(
|
64 |
+
model="dummy_model_name",
|
65 |
+
messages=[
|
66 |
+
{"role": "system", "content": "あなたは誠実で優秀な日本人のアシスタントです。特に指示が無い場合は、常に日本語で回答してください。"},
|
67 |
+
{"role": "user", "content": "古代ギリシャを学ぶ上で知っておくべきポイントは?"}
|
68 |
+
]
|
69 |
+
)
|
70 |
+
```
|
71 |
+
|
72 |
+
## Use with Desktop App
|
73 |
+
|
74 |
+
There are various desktop applications that can handle GGUF models, but here we will introduce how to use a model in a local environment without coding by using LM Studio.
|
75 |
+
|
76 |
+
- **Installation**: Download and install [LM Studio](https://lmstudio.ai/).
|
77 |
+
- **Downloading the Model**: Search for `elyza/Llama-3-ELYZA-JP-8B-GGUF` in the search bar on the home page 🏠, and download `Llama-3-ELYZA-JP-8B-q4_k_m.gguf`.
|
78 |
+
- **Start Chatting**: Click on 💬 in the sidebar, select `Llama-3-ELYZA-JP-8B-GGUF` from "Select a Model to load" in the header, and load the model. Now you can freely chat with the local LLM.
|
79 |
+
- **Setting Options**: You can set options from the sidebar on the right. Faster inference can be achieved by setting Quick GPU Offload Settings to Max in the GPU Settings.
|
80 |
+
- **For Developers, Starting the API Server**: Click `<->` in the left sidebar and move to the Local Server tab. Select the model and click Start Server to launch an OpenAI API-compatible API server.
|
81 |
+
|
82 |
+
## Quantization Options
|
83 |
+
|
84 |
+
Currently, we only offer quantized models in the Q4_K_M format.
|
85 |
+
|
86 |
+
## Developers
|
87 |
+
|
88 |
+
Listed in alphabetical order.
|
89 |
+
|
90 |
+
- [Masato Hirakawa](https://huggingface.co/m-hirakawa)
|
91 |
+
- [Shintaro Horie](https://huggingface.co/e-mon)
|
92 |
+
- [Tomoaki Nakamura](https://huggingface.co/tyoyo)
|
93 |
+
- [Daisuke Oba](https://huggingface.co/daisuk30ba)
|
94 |
+
- [Sam Passaglia](https://huggingface.co/passaglia)
|
95 |
+
- [Akira Sasaki](https://huggingface.co/akirasasaki)
|
96 |
+
|
97 |
+
## License
|
98 |
+
|
99 |
+
[Meta Llama 3 Community License](https://llama.meta.com/llama3/license/)
|
100 |
+
|
101 |
+
## How to Cite
|
102 |
+
|
103 |
+
```tex
|
104 |
+
@misc{elyzallama2024,
|
105 |
+
title={elyza/Llama-3-ELYZA-JP-8B},
|
106 |
+
url={https://huggingface.co/elyza/Llama-3-ELYZA-JP-8B},
|
107 |
+
author={Masato Hirakawa and Shintaro Horie and Tomoaki Nakamura and Daisuke Oba and Sam Passaglia and Akira Sasaki},
|
108 |
+
year={2024},
|
109 |
+
}
|
110 |
+
```
|
111 |
+
|
112 |
+
## Citations
|
113 |
+
|
114 |
+
```tex
|
115 |
+
@article{llama3modelcard,
|
116 |
+
title={Llama 3 Model Card},
|
117 |
+
author={AI@Meta},
|
118 |
+
year={2024},
|
119 |
+
url = {https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md}
|
120 |
+
}
|
121 |
```
|