File size: 4,714 Bytes
764f017
 
 
 
 
 
 
 
1f9f284
764f017
 
 
 
 
 
 
 
 
 
cd055ed
764f017
52f5b9f
764f017
8ea5584
cd055ed
 
 
aa22868
 
 
cd055ed
764f017
 
8ea5584
764f017
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8ea5584
764f017
 
 
 
 
 
 
 
 
8ea5584
764f017
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8ea5584
764f017
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
---
library_name: transformers
license: llama3
language:
- ja
- en
---

# Llama-3-ELYZA-JP-8B-AWQ

![Llama-3-ELYZA-JP-8B-image](./key_visual.png)

## Model Description

**Llama-3-ELYZA-JP-8B** is a large language model trained by [ELYZA, Inc](https://elyza.ai/).
Based on [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct), it has been enhanced for Japanese usage through additional pre-training and instruction tuning.

For more details, please refer to [our blog post](https://note.com/elyza/n/n360b6084fdbd).

## Quantization

We have prepared two quantized model options, GGUF and AWQ. This is the [AutoAWQ](https://github.com/casper-hansen/AutoAWQ) model.

The following table shows the performance degradation due to quantization:

| Model | ELYZA-tasks-100 GPT4 score |
| :-------------------------------- | ---: |
| [Llama-3-ELYZA-JP-8B](https://huggingface.co/elyza/Llama-3-ELYZA-JP-8B)               | 3.655 |
| [Llama-3-ELYZA-JP-8B-GGUF (Q4_K_M)](https://huggingface.co/elyza/Llama-3-ELYZA-JP-8B-GGUF) | 3.57  |
| [Llama-3-ELYZA-JP-8B-AWQ](https://huggingface.co/elyza/Llama-3-ELYZA-JP-8B-AWQ)           | 3.39  |

## Use with vLLM

Install vLLM:

```bash
pip install vllm
```

### vLLM Offline Batched Inference

```python
from vllm import LLM, SamplingParams

llm = LLM(model="elyza/Llama-3-ELYZA-JP-8B-AWQ", quantization="awq")
tokenizer = llm.get_tokenizer()

DEFAULT_SYSTEM_PROMPT = "あなたは誠実で優秀な日本人のアシスタントです。特に指示が無い場合は、常に日本語で回答してください。"
sampling_params = SamplingParams(temperature=0.6, top_p=0.9, max_tokens=1000)
messages_batch = [
    [
        {"role": "system", "content": DEFAULT_SYSTEM_PROMPT},
        {"role": "user", "content": "古代ギリシャを学ぶ上で知っておくべきポイントは?"}
    ],
    [
        {"role": "system", "content": DEFAULT_SYSTEM_PROMPT},
        {"role": "user", "content": "クマが海辺に行ってアザラシと友達になり、最終的には家に帰るというプロットの短編小説を書いてください。"}
    ]
]

prompts = [
    tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    for messages in messages_batch
]

outputs = llm.generate(prompts, sampling_params)

# Print the outputs.
for output in outputs:
    print(output.outputs[0].text)
    print("=" * 50)
```


### vLLM OpenAI Compatible Server

Start the API server:
```bash
python -m vllm.entrypoints.openai.api_server \
--model elyza/Llama-3-ELYZA-JP-8B-AWQ \
--port 8000 \
--host localhost \
--quantization awq
```


Call the API using curl:
```bash
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
  "model": "elyza/Llama-3-ELYZA-JP-8B-AWQ",
  "messages": [
    { "role": "system", "content": "あなたは誠実で優秀な日本人のアシスタントです。特に指示が無い場合は、常に日本語で回答してください。" },
    { "role": "user", "content": "古代ギリシャを学ぶ上で知っておくべきポイントは?" }
  ],
  "temperature": 0.6,
  "max_tokens": 1000,
  "stream": false
}'
```

Call the API using Python:
```python
import openai

client = openai.OpenAI(
    base_url="http://localhost:8000/v1",
    api_key = "dummy_api_key"
)

completion = client.chat.completions.create(
    model="elyza/Llama-3-ELYZA-JP-8B-AWQ",
    messages=[
        {"role": "system", "content": "あなたは誠実で優秀な日本人のアシスタントです。特に指示が無い場合は、常に日本語で回答してください。"},
        {"role": "user", "content": "古代ギリシャを学ぶ上で知っておくべきポイントは?"}
    ]
)
```

## Developers

Listed in alphabetical order.

- [Masato Hirakawa](https://huggingface.co/m-hirakawa)
- [Shintaro Horie](https://huggingface.co/e-mon)
- [Tomoaki Nakamura](https://huggingface.co/tyoyo)
- [Daisuke Oba](https://huggingface.co/daisuk30ba)
- [Sam Passaglia](https://huggingface.co/passaglia)
- [Akira Sasaki](https://huggingface.co/akirasasaki)

## License

[Meta Llama 3 Community License](https://llama.meta.com/llama3/license/)

## How to Cite

```tex
@misc{elyzallama2024,
      title={elyza/Llama-3-ELYZA-JP-8B},
      url={https://huggingface.co/elyza/Llama-3-ELYZA-JP-8B},
      author={Masato Hirakawa and Shintaro Horie and Tomoaki Nakamura and Daisuke Oba and Sam Passaglia and Akira Sasaki},
      year={2024},
}
```

## Citations

```tex
@article{llama3modelcard,
    title={Llama 3 Model Card},
    author={AI@Meta},
    year={2024},
    url = {https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md}
}
```