kron_300m_8 / README.md
KaiyueWen's picture
Upload folder using huggingface_hub
ca0fd43 verified

Model Card

Best configuration

Hyperparameter Value
beta1 0.95
block_size 256
learning_rate 0.0005
max_grad_norm 1
min_lr_ratio 0
normalize_grads True
partition_grads_into_blocks True
preconditioner_init_scale 1
preconditioner_lr 0.2
preconditioner_update_probability 0.1
train_batch_size 128
update_prob_flat_start 2000
warmup 1000
weight_decay 0.7