franknoh commited on
Commit
578a47c
โ€ข
1 Parent(s): 928a4d7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +127 -0
README.md CHANGED
@@ -1,3 +1,130 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ language:
4
+ - ko
5
+ library_name: transformers
6
+ pipeline_tag: automatic-speech-recognition
7
+ tags:
8
+ - speech
9
+ - audio
10
  ---
11
+
12
+ # hubert-base-korean
13
+
14
+ ## Model Details
15
+
16
+ Hubert(Hidden-Unit BERT)๋Š” Facebook์—์„œ ์ œ์•ˆํ•œ Speech Representation Learning ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.
17
+ Hubert๋Š” ๊ธฐ์กด์˜ ์Œ์„ฑ ์ธ์‹ ๋ชจ๋ธ๊ณผ ๋‹ฌ๋ฆฌ, ์Œ์„ฑ ์‹ ํ˜ธ๋ฅผ raw waveform์—์„œ ๋ฐ”๋กœ ํ•™์Šตํ•˜๋Š” self-supervised learning ๋ฐฉ์‹์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
18
+
19
+ ์ด ์—ฐ๊ตฌ๋Š” ๊ตฌ๊ธ€์˜ TPU Research Cloud(TRC)๋ฅผ ํ†ตํ•ด ์ง€์›๋ฐ›์€ Cloud TPU๋กœ ํ•™์Šต๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
20
+
21
+ ### Model Description
22
+
23
+ <table>
24
+ <tr>
25
+ <td colspan="2"></td>
26
+ <td>Base</td>
27
+ <td>Large</td>
28
+ </tr>
29
+ <tr>
30
+ <td rowspan="3">CNN Encoder</td>
31
+ <td>strides</td>
32
+ <td colspan="2">5, 2, 2, 2, 2, 2, 2</td>
33
+ </tr>
34
+ <tr>
35
+ <td>kernel width</td>
36
+ <td colspan="2">10, 3, 3, 3, 3, 2, 2</td>
37
+ </tr>
38
+ <tr>
39
+ <td>channel</td>
40
+ <td colspan="2">512</td>
41
+ </tr>
42
+ <tr>
43
+ <td rowspan="4">Transformer Encoder</td>
44
+ <td>Layer</td>
45
+ <td>12</td>
46
+ <td>24</td>
47
+ </tr>
48
+ <tr>
49
+ <td>embedding dim</td>
50
+ <td>768</td>
51
+ <td>1024</td>
52
+ </tr>
53
+ <tr>
54
+ <td>inner FFN dim</td>
55
+ <td>3072</td>
56
+ <td>4096</td>
57
+ </tr>
58
+ <tr>
59
+ <td>attention heads</td>
60
+ <td>8</td>
61
+ <td>16</td>
62
+ </tr>
63
+ <tr>
64
+ <td>Projection</td>
65
+ <td>dim</td>
66
+ <td>256</td>
67
+ <td>768</td>
68
+ </tr>
69
+ <tr>
70
+ <td colspan="2">Params</td>
71
+ <td>95M</td>
72
+ <td>317M </td>
73
+ </tr>
74
+ </table>
75
+
76
+ ## How to Get Started with the Model
77
+
78
+ ### Pytorch
79
+
80
+ ```py
81
+ import torch
82
+ from transformers import HubertModel
83
+
84
+ model = HubertModel.from_pretrained("team-lucid/hubert-large-korean")
85
+
86
+ wav = torch.ones(1, 16000)
87
+ outputs = model(wav)
88
+ print(f"Input: {wav.shape}") # [1, 16000]
89
+ print(f"Output: {outputs.last_hidden_state.shape}") # [1, 49, 768]
90
+ ```
91
+
92
+ ### JAX/Flax
93
+
94
+ ```py
95
+ import jax.numpy as jnp
96
+ from transformers import FlaxAutoModel
97
+
98
+ model = FlaxAutoModel.from_pretrained("team-lucid/hubert-large-korean", trust_remote_code=True)
99
+
100
+ wav = jnp.ones((1, 16000))
101
+ outputs = model(wav)
102
+ print(f"Input: {wav.shape}") # [1, 16000]
103
+ print(f"Output: {outputs.last_hidden_state.shape}") # [1, 49, 768]
104
+ ```
105
+
106
+ ## Training Details
107
+
108
+ ### Training Data
109
+
110
+ ํ•ด๋‹น ๋ชจ๋ธ์€ ๊ณผํ•™๊ธฐ์ˆ ์ •๋ณดํ†ต์‹ ๋ถ€์˜ ์žฌ์›์œผ๋กœ ํ•œ๊ตญ์ง€๋Šฅ์ •๋ณด์‚ฌํšŒ์ง„ํฅ์›์˜ ์ง€์›์„ ๋ฐ›์•„
111
+ ๊ตฌ์ถ•๋œ [์ž์œ ๋Œ€ํ™” ์Œ์„ฑ(์ผ๋ฐ˜๋‚จ์—ฌ)](https://www.aihub.or.kr/aihubdata/data/view.do?dataSetSn=109), [๋‹คํ™”์ž ์Œ์„ฑํ•ฉ์„ฑ ๋ฐ์ดํ„ฐ](https://www.aihub.or.kr/aihubdata/data/view.do?dataSetSn=542), [๋ฐฉ์†ก ์ฝ˜ํ…์ธ  ๋Œ€ํ™”์ฒด ์Œ์„ฑ์ธ์‹ ๋ฐ์ดํ„ฐ](https://www.aihub.or.kr/aihubdata/data/view.do?dataSetSn=463)
112
+ ์—์„œ ์•ฝ 4,000์‹œ๊ฐ„์„ ์ถ”์ถœํ•ด ํ•™์Šต๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
113
+
114
+ ### Training Procedure
115
+
116
+ [์› ๋…ผ๋ฌธ](https://arxiv.org/pdf/2106.07447.pdf)๊ณผ ๋™์ผํ•˜๊ฒŒ MFCC ๊ธฐ๋ฐ˜์œผ๋กœ Base ๋ชจ๋ธ์„ ํ•™์Šตํ•œ ๋‹ค์Œ, 500 cluster๋กœ k-means๋ฅผ ์ˆ˜ํ–‰ํ•ด ๋‹ค์‹œ Base์™€
117
+ Large ๋ชจ๋ธ์„ ํ•™์Šตํ–ˆ์Šต๋‹ˆ๋‹ค.
118
+
119
+ #### Training Hyperparameters
120
+
121
+ | Hyperparameter | Base | Large |
122
+ |:--------------------|---------|--------:|
123
+ | Warmup Steps | 32,000 | 32,000 |
124
+ | Learning Rates | 5e-4 | 1.5e-3 |
125
+ | Batch Size | 128 | 128 |
126
+ | Weight Decay | 0.01 | 0.01 |
127
+ | Max Steps | 400,000 | 400,000 |
128
+ | Learning Rate Decay | 0.1 | 0.1 |
129
+ | \\(Adam\beta_1\\) | 0.9 | 0.9 |
130
+ | \\(Adam\beta_2\\) | 0.99 | 0.99 |