File size: 4,097 Bytes
a807d8d
 
9bfcdd9
a807d8d
 
a7f11e8
 
 
 
b77fa64
a7f11e8
 
 
a807d8d
 
9bfcdd9
a807d8d
9bfcdd9
a807d8d
9bfcdd9
 
 
a807d8d
 
 
9bfcdd9
a807d8d
9bfcdd9
 
 
 
 
 
 
a807d8d
9bfcdd9
a807d8d
 
 
9bfcdd9
a807d8d
9bfcdd9
a807d8d
9bfcdd9
a807d8d
9bfcdd9
a807d8d
9bfcdd9
 
 
a807d8d
9bfcdd9
a807d8d
9bfcdd9
a807d8d
9bfcdd9
 
a807d8d
9bfcdd9
a807d8d
9bfcdd9
a807d8d
9bfcdd9
a807d8d
9bfcdd9
 
 
a807d8d
 
 
9bfcdd9
 
 
a807d8d
9bfcdd9
a807d8d
9bfcdd9
a807d8d
9bfcdd9
a807d8d
9bfcdd9
a807d8d
9bfcdd9
a807d8d
9bfcdd9
 
a807d8d
 
9bfcdd9
a807d8d
9bfcdd9
 
a807d8d
9bfcdd9
a807d8d
9bfcdd9
a807d8d
9bfcdd9
a807d8d
9bfcdd9
a807d8d
 
 
9bfcdd9
a807d8d
9bfcdd9
a807d8d
9bfcdd9
a807d8d
 
9bfcdd9
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
---
library_name: transformers
license: cc-by-nc-sa-4.0
---

# The license is cc-by-nc-sa-4.0.

- Commercializing is not allowed.

![mark1](ko-1.4.png)



# Model Card for Model ID

๊ธฐ์กด์˜ DopeorNope/Ko-Mixtral-v1.3-MoE-7Bx2 ๋ชจ๋ธ์—์„œ ํ–ฅ์ƒ๋œ 1.4๋ฒ„์ „์ž…๋‹ˆ๋‹ค.

์ถ”๊ฐ€๋œ ์‚ฌํ•ญ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

1. ํ›ˆ๋ จ์— ํ™œ์šฉ๋œ ์ฝ”ํผ์Šค๋ฅผ ๋งค๋‰ด์–ผํ•˜๊ฒŒ ๊ฒ€ํ† ํ•˜๊ณ  ์ด์ƒํ•œ ์ฝ”ํผ์Šค๋ฅผ ์ˆ˜์ •ํ•˜๊ณ  ์ •์ œํ•˜์˜€์Šต๋‹ˆ๋‹ค.
2. Near dudup ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ ์šฉํ•˜์—ฌ ์ค‘๋ณต๋˜๋Š” ์ฝ”ํผ์Šค๋ฅผ ์ œ๊ฑฐํ•˜์˜€์Šต๋‹ˆ๋‹ค.
3. ๊ธฐ์กด์˜ 3๊ฐ€์ง€ task์—์„œ ํ•œ๊ฐ€์ง€ task๋ฅผ ์ถ”๊ฐ€ํ•˜์˜€์Šต๋‹ˆ๋‹ค.



## Model Details

### Model Description
- **Developed by:** DopeorNope(Seungyoo Lee), kyujinpy(Kyujin Han)
- **Model type:** Mixtral
- **Language:** English based model but finetuned with Korean corpus
- **License:** cc-by-nc-sa-4.0
- **Finetuned from model:** DopeorNope/Ko-Mixtral-v1.3-MoE-7Bx2
- **funded by:** the Ministry of Science and ICT(MSIT, Korea) & Gwangju Metropolitan City

## Training

#### Testing Data

AI-HUB์—์„œ ์ œ๊ณต๋œ ์ฝ”ํผ์Šค๋ฅผ ๊ฐ€์ง€๊ณ  ๋‹ค์Œ๊ณผ ๊ฐ™์€ 4๊ฐ€์ง€ task๋ฅผ text mining์œผ๋กœ ๋งŒ๋“ค์–ด ์ ์šฉํ•˜์˜€์Šต๋‹ˆ๋‹ค.

- **1.Mask prediction Task**

```python

#Mask prediction

#๋ฌธ์žฅ์—์„œ ํ•œ๊ตญ์–ด ๋‹จ์–ด๋ฅผ ๋งˆ์Šคํ‚น ํ•œ ์ดํ›„, ์ด ๋‹จ์–ด๋ฅผ ์˜ˆ์ธกํ•˜๊ฒŒ ๋งŒ๋“œ๋Š” Task์ž…๋‹ˆ๋‹ค.
 
Text='์ง€๋Šฅ(ๆ™บ่ƒฝ) ๋˜๋Š” ์ธํ…”๋ฆฌ์ „์Šค(intelligence)๋Š” ์ธ๊ฐ„์˜ <MASK> ๋Šฅ๋ ฅ์„ ๋งํ•œ๋‹ค.'

Response='์ง€์ '

Complete_text='์ง€๋Šฅ(ๆ™บ่ƒฝ) ๋˜๋Š” ์ธํ…”๋ฆฌ์ „์Šค(intelligence)๋Š” ์ธ๊ฐ„์˜ ์ง€์  ๋Šฅ๋ ฅ์„ ๋งํ•œ๋‹ค.'

```
- **2.Text-allign Task**

```python

#Text-allign Task

#๋ฌธ๋‹จ์—์„œ ๊ฐ ๋ฌธ์žฅ๋“ค์„ ์ถ”์ถœํ•˜๊ณ  ์ถ”์ถœํ•œ ๋ฌธ์žฅ๋“ค์„ ๋ฌด์ž‘์œ„๋กœ ์„ž์€ ํ›„ ์„ž์€ ๋ฌธ์žฅ๋“ค์„ ๋ฌธ๋งฅ์ƒ ์ ์ ˆํ•˜๊ฒŒ ๋ฐฐ์—ดํ•˜๋Š” ํƒœ์ŠคํŠธ ์ž…๋‹ˆ๋‹ค.

Text_list=['๋ณต์ˆ˜๋ช…๋ น-๋ณต์ˆ˜์ž๋ฃŒ(MIMD,Multiple Instruction, Multiple Data)์€ ์ „์‚ฐ์—์„œ ๋ณ‘๋ ฌํ™”์˜ ํ•œ ๊ธฐ๋ฒ•์ด๋‹ค.',
           '๋ถ„์‚ฐ ๋ฉ”๋ชจ๋ฆฌ์˜ ์˜ˆ๋Š” MPP(massively parallel processors)์™€ COW (Clusters of Workstations)์ด๋‹ค.',
           'MIMD๊ธฐ๊ณ„๋Š” ๊ณต์œ  ๋ฉ”๋ชจ๋ฆฌ์ด๊ฑฐ๋‚˜ ๋ถ„์‚ฐ ๋ฉ”๋ชจ๋ฆฌ์ด๋ฉฐ ์ด๋Ÿฌํ•œ ๋ถ„๋ฅ˜๋Š” MIMD๊ฐ€ ์–ด๋–ป๊ฒŒ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์ด์šฉํ•˜๋Š๋ƒ์— ๋”ฐ๋ผ ๋‚˜๋‰œ๋‹ค.']



Response='๋ณต์ˆ˜๋ช…๋ น-๋ณต์ˆ˜์ž๋ฃŒ(MIMD,Multiple Instruction, Multiple Data)์€ ์ „์‚ฐ์—์„œ ๋ณ‘๋ ฌํ™”์˜ ํ•œ ๊ธฐ๋ฒ•์ด๋‹ค.
          MIMD๊ธฐ๊ณ„๋Š” ๊ณต์œ  ๋ฉ”๋ชจ๋ฆฌ์ด๊ฑฐ๋‚˜ ๋ถ„์‚ฐ ๋ฉ”๋ชจ๋ฆฌ์ด๋ฉฐ ์ด๋Ÿฌํ•œ ๋ถ„๋ฅ˜๋Š” MIMD๊ฐ€ ์–ด๋–ป๊ฒŒ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์ด์šฉํ•˜๋Š๋ƒ์— ๋”ฐ๋ผ ๋‚˜๋‰œ๋‹ค.
          ๋ถ„์‚ฐ ๋ฉ”๋ชจ๋ฆฌ์˜ ์˜ˆ๋Š” MPP(massively parallel processors)์™€ COW (Clusters of Workstations)์ด๋‹ค.'

```

- **3.Text completion Task**

```python

#Text Completion

# ๋ฌธ๋‹จ์—์„œ ๋งˆ์ง€๋ง‰ ๋ฌธ์žฅ์„ ์ถ”์ถœํ•˜๊ณ , ์ถ”์ถœ๋œ ๋ฌธ์žฅ์˜ ์ด์ „์˜ ๋ฌธ๋‹จ๊นŒ์ง€๋ฅผ input์œผ๋กœ ํ•˜์—ฌ ๋งˆ์ง€๋ง‰ ๋ฌธ์žฅ์„ ์˜ˆ์ธกํ•˜๊ฒŒ ํ•˜๋Š” ๊ณผ์ œ์ž…๋‹ˆ๋‹ค.

Text= '๊ทธ๋ฆฐ๋ธŒ๋ผ์šฐ์ €(GreenBrowser)๋Š” ์ธํ„ฐ๋„ท ์ต์Šคํ”Œ๋กœ๋Ÿฌ์—์„œ ์‚ฌ์šฉํ•˜๋Š” ํŠธ๋ผ์ด๋˜ํŠธ ๋ ˆ์ด์•„์›ƒ ์—”์ง„์„ ๋ฐ”ํƒ•์œผ๋กœ ํ•˜๋ฉฐ ์ค‘๊ตญ์— ๊ธฐ๋ฐ˜์„ ๋‘” ์†Œํ”„ํŠธ์›จ์–ด ํšŒ์‚ฌ์ธ ๋ชจ์–ดํ€ต(morequick)์—์„œ ๋งŒ๋“  ๋ฌด๋ฃŒ ์›น ๋ธŒ๋ผ์šฐ์ €๋‹ค. ๊ฐ„์ฒด์ž ์ค‘๊ตญ์–ด๊ฐ€ ์›น ๋ธŒ๋ผ์šฐ์ €์— ๋‚ด์žฅ๋˜์–ด ์žˆ๋‹ค.
      ๋งฅ์Šคํ†ค ์›น ๋ธŒ๋ผ์šฐ์ €์™€ ๋น„์Šทํ•˜์—ฌ MyIE์™€ ๋ฐ€์ ‘ํ•˜๊ฒŒ ๊ด€๋ จ๋˜์–ด ์žˆ๋‹ค. ๋งฅ์Šคํ†ค์šฉ์˜ ์ผ๋ถ€ ํ”Œ๋Ÿฌ๊ทธ์ธ์ด ๊ทธ๋ฆฐ๋ธŒ๋ผ์šฐ์ €์—์„œ๋„ ์ž‘๋™ํ•  ๊ฒƒ์ด๋‹ค.'


Response= '์ž๋™ ์Šคํฌ๋กค, ์ž๋™ ๋ฆฌํ”„๋ ˆ์‹œ, ์ž๋™ ์ €์žฅ, ์ž๋™ ํผ ์ฑ„์šฐ๊ธฐ์™€ ๊ฐ™์€ ๋งŽ์€ ์ž๋™ํ™” ๊ธฐ๋Šฅ์ด ์žˆ๋‹ค.'

```
- **4. Sentence Genration**

```python

#Text Completion

# ๋ฌธ์žฅ์—์„œ ๋ชจ๋“  ๋‹จ์–ด๋“ค์„ ์ถ”์ถœํ•˜๊ณ  ๋ฌด์ž‘์œ„๋กœ ์„ž์€ ํ›„ ์ค‘๋ณต๋œ ๋‹จ์–ด๋ฅผ ์ œ๊ฑฐํ•˜๊ณ , ์ œ์‹œ๋œ ๋‹จ์–ด ๋ฆฌ์ŠคํŠธ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์™„๋ฒฝํ•œ ๋ฌธ์žฅ์„ ์ƒ์„ฑํ•ด๋‚ด๋Š” task์ž…๋‹ˆ๋‹ค.

Word_List: ['ฯ†์˜', '์ œ์–ด์—์„œ๋Š”', '์ œ์–ด์™€', 'ํ‘œํ˜„์ด', 'ฯˆ', '๋กœ๋ด‡', '์“ฐ์ธ๋‹ค', 'ฮธ', '๊ฐ™์€', '์ž์ฃผ', '๊ธฐ๊ธฐ']



response= '์ž๋™ ์Šคํฌ๋กค, ์ž๋™ ๋ฆฌํ”„๋ ˆ์‹œ, ์ž๋™ ์ €์žฅ, ์ž๋™ ํผ ์ฑ„์šฐ๊ธฐ์™€ ๊ฐ™์€ ๋งŽ์€ ์ž๋™ํ™” ๊ธฐ๋Šฅ์ด ์žˆ๋‹ค.'

```

### Environments


- **Hardware Type:** Nvidia A100 x 4
- **Training hours:** 3 Days