File size: 2,585 Bytes
070863a
 
 
 
 
12cc579
 
 
 
f3f47d2
070863a
f3f47d2
 
070863a
f3f47d2
 
 
 
070863a
95d0bd8
 
070863a
f3f47d2
070863a
f3f47d2
 
070863a
f3f47d2
070863a
f3f47d2
 
070863a
 
f3f47d2
 
 
070863a
f3f47d2
 
070863a
f3f47d2
 
 
 
070863a
f3f47d2
 
 
 
 
 
070863a
f3f47d2
070863a
2769609
 
 
 
070863a
f3f47d2
070863a
f3f47d2
070863a
f3f47d2
070863a
f3f47d2
070863a
f3f47d2
070863a
f3f47d2
070863a
f3f47d2
070863a
f3f47d2
070863a
f3f47d2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
---
library_name: transformers
tags: []
---

<p align="center">
  <img src="https://cdn-uploads.huggingface.co/production/uploads/64b63f8ad57e02621dc93c8b/e2VLH4eBlq3678PsI_itw.png" alt="drawing" width="512"/>
</p>

# How to use ・ 使い方

We recommend on running this model in an environment with at least 60GB of VRAM - ideally a A100 (80GB) GPU
A100 (80GB)の1枚以上の環境がおすすめです

### Huggingface
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("lightblue/ao-karasu-72B-AWQ-4bit")
model = AutoModelForCausalLM.from_pretrained("lightblue/ao-karasu-72B-AWQ-4bit", device_map="auto")

pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)

messages = [{"role": "system", "content": "あなたはAIアシスタントです。"}]
messages.append({"role": "user", "content": "イギリスの首相は誰ですか?"})

prompt = tokenizer.apply_chat_template(conversation=messages, add_generation_prompt=True, tokenize=False)

pipe(prompt, max_new_tokens=100, do_sample=False, temperature=0.0, return_full_text=False)
```


### vLLM
```python
from vllm import LLM, SamplingParams

sampling_params = SamplingParams(temperature=0.0, max_tokens=100)
llm = LLM(model="lightblue/aokarasu-72B-AWQ-4bit")

messages = [{"role": "system", "content": "あなたはAIアシスタントです。"}]
messages.append({"role": "user", "content": "イギリスの首相は誰ですか?"})
prompt = llm.llm_engine.tokenizer.tokenizer.apply_chat_template(conversation=messages, add_generation_prompt=True, tokenize=False)
prompts = [prompt]

outputs = llm.generate(prompts, sampling_params)
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
```

# Training details 学習詳細

[English dev blog](https://note.com/peter_lightblue/n/n483d194d3614?sub_rt=share_pw)


[日本語ブログ](https://note.com/lightblue_tech/n/nfda12435b262?sub_rt=share_pw)

# Training data 学習データ

Roughly 20 million characters samples from a dataset of more than 1.1 billion characters, which was made up of:

~450 million characters from Wikipedia-based QA (same as Qarasu)

~200 million characters from technical blogs (new)

~200 million characters from Japanese QA site answers (new)

~100 million characters from LLM generated prompts and responses (same as Qarasu)

~70 million characters from news articles (new)

# Training schedule

Training for ~1 day on a A100 (80GB) GPU