heegyu commited on
Commit
dfedd96
1 Parent(s): 575c30e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -17,15 +17,15 @@ pipeline_tag: text-generation
17
  - GPT2(Flax, Pytorch)
18
  - 12 Layers, 768 hidden dim, 3072 intermediate, 12 heads, 51200 vocab size
19
  - 1024 max_seq_len
20
- - 파라미터 수: 350M
21
 
22
  ## 학습 환경 및 하이퍼파라미터
23
  - TPU V2-8
24
- - Learning Rate: 3e-4, Batch Size: 512(=64 accum x 8 devices), Scheduler: Linear, WarmUp: 1000 step
25
  - Optimizer: AdamW(adam_beta1=0.9 adam_beta2=0.98, weight_decay=0.01)
26
  - Training Steps: 43247 (3 epoch)
27
  - 학습 토큰 수: 21.11B (43247 * 512 * 1024seq / 1024^3)
28
- - 학습 기간: 2023/1/17 ~ 2023/1/19 (2일 6시간)
29
  - 학습 코드: https://github.com/HeegyuKim/language-model
30
 
31
  ## 학습에 사용한 데이터
 
17
  - GPT2(Flax, Pytorch)
18
  - 12 Layers, 768 hidden dim, 3072 intermediate, 12 heads, 51200 vocab size
19
  - 1024 max_seq_len
20
+ - 파라미터 수: 216M
21
 
22
  ## 학습 환경 및 하이퍼파라미터
23
  - TPU V2-8
24
+ - Learning Rate: 5e-4, Batch Size: 512(=64 accum x 8 devices), Scheduler: Linear, WarmUp: 1000 step
25
  - Optimizer: AdamW(adam_beta1=0.9 adam_beta2=0.98, weight_decay=0.01)
26
  - Training Steps: 43247 (3 epoch)
27
  - 학습 토큰 수: 21.11B (43247 * 512 * 1024seq / 1024^3)
28
+ - 학습 기간: 2023/1/25 ~ 2023/1/29
29
  - 학습 코드: https://github.com/HeegyuKim/language-model
30
 
31
  ## 학습에 사용한 데이터