Adding `safetensors` variant of this model
#4 opened about 14 hours ago
by
SFconvertbot

此时不应降低学习率,warmup 等超参,而是应该放大到Pretrain 规模
3
#2 opened almost 2 years ago
by
daner
那这个怎么调用呢
4
#1 opened almost 2 years ago
by
yjianchun