File size: 9,216 Bytes
bd28c3a
68983b8
 
 
 
bd28c3a
 
68983b8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
---
language:
- en
- ko
pipeline_tag: text-generation
license: cc-by-nc-sa-4.0
---

# **Twice-KoSOLAR-16.1B-instruct-test**  

## Model Details

**Model Developers** Kyujin Han (kyujinpy)

**๋ชจ๋ธ ๋ชฉ์ **
<img src='./solar.png'>

์ตœ๊ทผ, SOLAR-10.7B ๋ชจ๋ธ์ด [Depth-Up-Scaling](https://arxiv.org/pdf/2312.15166.pdf)(์œ„์˜ ์‚ฌ์ง„) ๋ฐฉ๋ฒ•๋ก ์„ ๋‚ด์„ธ์›Œ์„œ LLM ๋ฆฌ๋”๋ณด๋“œ์—์„œ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์ด๊ณ  ์žˆ๋‹ค. ๋”๋ถˆ์–ด์„œ `์•ผ๋†€์ž`์—์„œ ๋งŒ๋“  `seungduk/KoSOLAR-10.7B-v0.1` ๋ชจ๋ธ์€ Ko-LLM ๋ฆฌ๋”๋ณด๋“œ์— ํฐ ํŒŒ๊ธ‰๋ ฅ์„ ๋ถˆ๋Ÿฌ์˜ค๋ฉด์„œ, ์•ž์œผ๋กœ์˜ ๋ฆฌ๋”๋ณด๋“œ์˜ ํ๋ฆ„๋„ ๋ฐ”๋€” ๊ฒƒ์œผ๋กœ ์˜ˆ์ƒ๋œ๋‹ค.
  
์—ฌ๊ธฐ์„œ ๋‹จ์ˆœํ•œ ํ˜ธ๊ธฐ์‹ฌ์ด ๋“ค์—ˆ๋‹ค. **Upstage์—์„œ ๋ฐœํ‘œํ•œ Depth-Up-Scaling(DUS) ๋ฐฉ๋ฒ•๋ก ์€ mistral-7B ๋ชจ๋ธ 2๊ฐœ๋ฅผ merge(passthrough)ํ•œ ๋ฐฉ๋ฒ•**์ด๋‹ค.  
์ด๋•Œ ๋†€๋ž๊ฒŒ๋„, DUS ๋ฐฉ๋ฒ•๋ก ์„ ์ ์šฉํ•œ `upstage/SOLAR-10.7B-v1.0`๋ชจ๋ธ์€ ๊ธฐ์กด์˜ mistral-7B ๋ชจ๋ธ๋ณด๋‹ค ๋ฆฌ๋”๋ณด๋“œ์—์„œ ๋†’์€ ์„ฑ๋Šฅ์„ ๊ธฐ๋กํ–ˆ๋‹ค. (์•„๋ž˜์˜ ํ…Œ์ด๋ธ” ์ฐธ๊ณ )  
๊ทธ๋ ‡๋‹ค๋ฉด, DUS ๋ฐฉ๋ฒ•๋ก ์„ ์ œํ•œ์—†์ด, ๋‹ค๋ฅธ ๋ชจ๋ธ์— ์ ์šฉํ•˜๋ฉด ๋˜‘๊ฐ™์€ ๊ฒฐ๊ณผ๊ฐ€ ๋ฐœ์ƒํ• ์ง€ ๋„ˆ๋ฌด๋‚˜ ๊ถ๊ธˆํ–ˆ๋‹ค. ๐Ÿ™ƒ
์‹คํ—˜์„ ํ†ตํ•ด์„œ ๋‚˜์˜ ํ˜ธ๊ธฐ์‹ฌ์— ๋Œ€ํ•œ ๊ฒฐ๋ก ์„ ๋‚ด๋ ค๋ณด๊ณ ์ž ํ•œ๋‹ค. ๐Ÿ˜‹๐Ÿ˜‹   
  
| Model | Average | ARC | HellaSwag | MMLU | TruthfulQA | Winogrande | GSM8K |
| --- | --- | --- | --- | --- | --- | --- | --- | 
| [seungduk/KoSOLAR-10.7B-v0.1](https://huggingface.co/seungduk/KoSOLAR-10.7B-v0.1) | **66.04** | 62.03 | 84.54 | 65.56 | 45.03 | 83.58 | 55.50 |
| [upstage/SOLAR-10.7B-v1.0](https://huggingface.co/upstage/SOLAR-10.7B-v1.0) | **66.04** | 61.95 | 84.60 | 65.48 | 45.04 | 83.66 | 55.50 |
| [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) | 60.97 | 59.98 | 83.31 | 64.16 | 42.15 | 78.37 | 37.83 |
> Follow up as [En-link](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).  
   
**Method**   
Instruction-tuning.

**Hyperparameters**    
```python
python finetune.py \
    --base_model PracticeLLM/Twice-KoSOLAR-16.1B-test \
    --data-path  kyujinpy/KOR-OpenOrca-Platypus-v3 \
    --output_dir ./Twice-KoSOLAR-16.1B-instruct-test \
    --batch_size 64 \
    --micro_batch_size 1 \
    --num_epochs 1 \
    --learning_rate 3e-5 \
    --cutoff_len 4096 \
    --val_set_size 0 \
    --lora_r 16 \
    --lora_alpha 16 \
    --lora_dropout 0.05 \
    --lora_target_modules '[q_proj, k_proj, v_proj, o_proj, gate_proj, down_proj, up_proj, lm_head]' \
    --train_on_inputs False \
    --add_eos_token False \
    --group_by_length False \
    --prompt_template_name user_prompt \
    --lr_scheduler 'cosine' \
    #--warmup_steps 100 \
```
> Share all of things. It is my belief.  
    
# **Model Benchmark**  

## Open Ko-LLM leaderboard & lm-evaluation-harness(zero-shot)
- Follow up as [Ko-link](https://huggingface.co/spaces/upstage/open-ko-llm-leaderboard).    
| Model | Average | ARC | HellaSwag | MMLU | TruthfulQA | Ko-CommonGenV2 |
| --- | --- | --- | --- | --- | --- | --- | 
| PracticeLLM/Twice-KoSOLAR-16.1B-instruct-test | NaN | NaN | NaN | NaN | NaN | NaN |
| PracticeLLM/Twice-KoSOLAR-16.1B-test | 50.20 | 45.65 | 57.14 | 51.39 | 42.99 | 53.84 |
| [Megastudy/M-SOLAR-10.7B-v1.1-beta](https://huggingface.co/Megastudy/M-SOLAR-10.7B-v1.1-beta) | 55.25 | 51.71 | 60.86 | 54.24 | 47.12 | 62.34 |  
| [jjourney1125/M-SOLAR-10.7B-v1.0](https://huggingface.co/jjourney1125/M-SOLAR-10.7B-v1.0) | 55.15 | 49.57 | 60.12 | 54.60 | 49.23 | 62.22 |  
| [seungduk/KoSOLAR-10.7B-v0.1](https://huggingface.co/seungduk/KoSOLAR-10.7B-v0.1) | 52.40 | 47.18 | 59.54 | 52.04 | 41.84 | 61.39 |
  
- Follow up as [beomi/LM-Harness](https://github.com/Beomi/ko-lm-evaluation-harness)  
```
gpt2 (pretrained=PracticeLLM/Twice-KoSOLAR-16.1B-test), limit: None, provide_description: False, num_fewshot: 0, batch_size: None
|      Task      |Version| Metric |Value |   |Stderr|
|----------------|------:|--------|-----:|---|-----:|
|kobest_boolq    |      0|acc     |0.7201|ยฑ  |0.0120|
|                |       |macro_f1|0.7073|ยฑ  |0.0124|
|kobest_copa     |      0|acc     |0.6510|ยฑ  |0.0151|
|                |       |macro_f1|0.6506|ยฑ  |0.0151|
|kobest_hellaswag|      0|acc     |0.4520|ยฑ  |0.0223|
|                |       |acc_norm|0.5820|ยฑ  |0.0221|
|                |       |macro_f1|0.4475|ยฑ  |0.0222|
|kobest_sentineg |      0|acc     |0.7078|ยฑ  |0.0229|
|                |       |macro_f1|0.7071|ยฑ  |0.0229|

gpt2 (pretrained=Megastudy/M-SOLAR-10.7B-v1.1-beta), limit: None, provide_description: False, num_fewshot: 0, batch_size: None
|      Task      |Version| Metric |Value |   |Stderr|
|----------------|------:|--------|-----:|---|-----:|
|kobest_boolq    |      0|acc     |0.7137|ยฑ  |0.0121|
|                |       |macro_f1|0.6878|ยฑ  |0.0128|
|kobest_copa     |      0|acc     |0.7060|ยฑ  |0.0144|
|                |       |macro_f1|0.7054|ยฑ  |0.0145|
|kobest_hellaswag|      0|acc     |0.4620|ยฑ  |0.0223|
|                |       |acc_norm|0.5360|ยฑ  |0.0223|
|                |       |macro_f1|0.4595|ยฑ  |0.0223|
|kobest_sentineg |      0|acc     |0.7431|ยฑ  |0.0220|
|                |       |macro_f1|0.7295|ยฑ  |0.0230|

gpt2 (pretrained=jjourney1125/M-SOLAR-10.7B-v1.0), limit: None, provide_description: False, num_fewshot: 0, batch_size: None
|      Task      |Version| Metric |Value |   |Stderr|
|----------------|------:|--------|-----:|---|-----:|
|kobest_boolq    |      0|acc     |0.5228|ยฑ  |0.0133|
|                |       |macro_f1|0.3788|ยฑ  |0.0097|
|kobest_copa     |      0|acc     |0.6860|ยฑ  |0.0147|
|                |       |macro_f1|0.6858|ยฑ  |0.0147|
|kobest_hellaswag|      0|acc     |0.4580|ยฑ  |0.0223|
|                |       |acc_norm|0.5380|ยฑ  |0.0223|
|                |       |macro_f1|0.4552|ยฑ  |0.0222|
|kobest_sentineg |      0|acc     |0.6474|ยฑ  |0.0240|
|                |       |macro_f1|0.6012|ยฑ  |0.0257|

gpt2 (pretrained=yanolja/KoSOLAR-10.7B-v0.1), limit: None, provide_description: False, num_fewshot: 0, batch_size: None
|      Task      |Version| Metric |Value |   |Stderr|
|----------------|------:|--------|-----:|---|-----:|
|kobest_boolq    |      0|acc     |0.8725|ยฑ  |0.0089|
|                |       |macro_f1|0.8722|ยฑ  |0.0089|
|kobest_copa     |      0|acc     |0.6850|ยฑ  |0.0147|
|                |       |macro_f1|0.6844|ยฑ  |0.0147|
|kobest_hellaswag|      0|acc     |0.4340|ยฑ  |0.0222|
|                |       |acc_norm|0.5840|ยฑ  |0.0221|
|                |       |macro_f1|0.4296|ยฑ  |0.0221|
|kobest_sentineg |      0|acc     |0.7506|ยฑ  |0.0217|
|                |       |macro_f1|0.7505|ยฑ  |0.0217|
```
    
# Implementation Code
```python
### KO-Platypus
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

repo = "PracticeLLM/Twice-KoSOLAR-16.1B-instruct-test"
OpenOrca = AutoModelForCausalLM.from_pretrained(
        repo,
        return_dict=True,
        torch_dtype=torch.float16,
        device_map='auto'
)
OpenOrca_tokenizer = AutoTokenizer.from_pretrained(repo)
```
  
--- Refereces (Model Card)
# yanolja/KoSOLAR-10.7B-v0.1

This model is a Korean vocabulary-extended version of [upstage/SOLAR-10.7B-v1.0](https://huggingface.co/upstage/SOLAR-10.7B-v1.0), trained on various Korean web-crawled datasets that are publicly available on HuggingFace.
The hypothesis was that while maintaining the original performance of the base model, we could add more tokens to the base model's vocabulary by training the embeddings for the new tokens only. The evaluation results seem to indicate that both English and Korean performances were preserved.

## Model Description

Most parameters of [upstage/SOLAR-10.7B-v1.0](https://huggingface.co/upstage/SOLAR-10.7B-v1.0) were frozen except for the embed_tokens layer and the lm_head layer. Embeddings for the existing tokens in those layers were frozen during training. The embeddings for the new tokens have been tuned.
  
---
# **Meet 10.7B Solar: Elevating Performance with Upstage Depth UP Scaling!**


# **Introduction**
We introduce SOLAR-10.7B, an advanced large language model (LLM) with 10.7 billion parameters, demonstrating superior performance in various natural language processing (NLP) tasks. It's compact, yet remarkably powerful, and demonstrates unparalleled state-of-the-art performance in models with parameters under 30B.

We present a methodology for scaling LLMs called depth up-scaling (DUS) , which encompasses architectural modifications and continued pretraining. In other words, we integrated Mistral 7B weights into the upscaled layers, and finally, continued pre-training for the entire model.


SOLAR-10.7B has remarkable performance. It outperforms models with up to 30B parameters, even surpassing the recent Mixtral 8X7B model. For detailed information, please refer to the experimental table.
Solar 10.7B is an ideal choice for fine-tuning. SOLAR-10.7B offers robustness and adaptability for your fine-tuning needs. Our simple instruction fine-tuning using the SOLAR-10.7B pre-trained model yields significant performance improvements ([SOLAR-10.7B-Instruct-v1.0](https://huggingface.co/upstage/SOLAR-10.7B-Instruct-v1.0)).

For full details of this model please read our [paper](https://arxiv.org/abs/2312.15166).