takase
commited on
Commit
·
76e4477
1
Parent(s):
3686599
update readme
Browse files
README.md
CHANGED
@@ -1,3 +1,66 @@
|
|
1 |
-
---
|
2 |
-
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language:
|
3 |
+
- ja
|
4 |
+
- en
|
5 |
+
license: mit
|
6 |
+
---
|
7 |
+
|
8 |
+
# Sarashina2.2-1B
|
9 |
+
|
10 |
+
This repository provides large language models trained by [SB Intuitions](https://www.sbintuitions.co.jp/).
|
11 |
+
|
12 |
+
## How to use
|
13 |
+
|
14 |
+
```python
|
15 |
+
import torch
|
16 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline, set_seed
|
17 |
+
|
18 |
+
model = AutoModelForCausalLM.from_pretrained("sbintuitions/sarashina2.2-1b", torch_dtype=torch.bfloat16, device_map="auto")
|
19 |
+
tokenizer = AutoTokenizer.from_pretrained("sbintuitions/sarashina2.2-1b")
|
20 |
+
generator = pipeline("text-generation", model=model, tokenizer=tokenizer)
|
21 |
+
set_seed(123)
|
22 |
+
|
23 |
+
text = generator(
|
24 |
+
"おはようございます、今日の天気は",
|
25 |
+
max_length=30,
|
26 |
+
do_sample=True,
|
27 |
+
pad_token_id=tokenizer.pad_token_id,
|
28 |
+
num_return_sequences=3,
|
29 |
+
)
|
30 |
+
|
31 |
+
for t in text:
|
32 |
+
print(t)
|
33 |
+
|
34 |
+
|
35 |
+
```
|
36 |
+
|
37 |
+
## Model Description
|
38 |
+
|
39 |
+
We constructed the Sarashina2.2-1B model, which consists of about 1 billion parameters (excluding embeddings and the LM head when calculating the number of parameters), using a three-phase training process.
|
40 |
+
First, we trained the model on 10 trillion tokens, including Japanese, English, and code data extracted from web corpora.
|
41 |
+
Next, we trained the model using synthetic data to improve its performance on math and coding tasks.
|
42 |
+
Finally, we trained the model with a small amount of data to enhance its performance on various application tasks.
|
43 |
+
|
44 |
+
The following tables show the model's performance on Japanese tasks.
|
45 |
+
For reference, we also present the performance of our previous LLMs.
|
46 |
+
As shown in the table, our Sarashina2.2-3B outperforms Sarashina2-7B in Japanese QA tasks such as NIILC and JMMLU.
|
47 |
+
In addition, Sarashina2.2-3B outperforms Sarashina2-70B in Japanese math and coding tasks such as MGSM-ja and JHumanEval.
|
48 |
+
|
49 |
+
#### Evaluation in Japanese tasks
|
50 |
+
|
51 |
+
| Model | NIILC | JMMLU | MGSM-ja | JHumanEval |
|
52 |
+
|------------------|------------|------------|-----------|------------|
|
53 |
+
| [Sarashina2-7B](https://huggingface.co/sbintuitions/sarashina2-7b) | 62.2 | 42.5 | 7.2 | 12.8 |
|
54 |
+
| [Sarashina2-70B](https://huggingface.co/sbintuitions/sarashina2-70b) | **66.1** | **62.7** | 56.4 | 22.0 |
|
55 |
+
|**[Sarashina2.2-0.5B](https://huggingface.co/sbintuitions/sarashina2.2-0.5b)**| 34.6 | 28.8 | 21.2 | 15.2 |
|
56 |
+
|**[Sarashina2.2-1B](https://huggingface.co/sbintuitions/sarashina2.2-1b)**| 47.2 | 38.4 | 38.8 | 21.3 |
|
57 |
+
|**[Sarashina2.2-3B](https://huggingface.co/sbintuitions/sarashina2.2-3b)**| 62.2 | 52.7 | **63.6** | **39.6** |
|
58 |
+
|
59 |
+
|
60 |
+
## Ethical Considerations and Limitations
|
61 |
+
This repository contains the pre-trained model, which has not yet been tuned to follow instructions.
|
62 |
+
Therefore, this model may generate meaningless sequences, inaccurate instances, or biased/objectionable outputs.
|
63 |
+
As post-trained Sarashina2.2 models, we have published [Sarashina2.2-0.5B-instruct-v0.1](https://huggingface.co/sbintuitions/sarashina2.2-0.5b-instruct-v0.1), [Sarashina2.2-1B-instruct-v0.1](https://huggingface.co/sbintuitions/sarashina2.2-1b-instruct-v0.1), and [Sarashina2.2-3B-instruct-v0.1](https://huggingface.co/sbintuitions/sarashina2.2-3b-instruct-v0.1).
|
64 |
+
|
65 |
+
## License
|
66 |
+
[MIT License](https://huggingface.co/sbintuitions/sarashina2.2-1b/blob/main/LICENSE)
|