add new checkpoint trained for a hundred steps with smaller max grad norm and weight decay 7a20e92 pszemraj commited on Jun 9, 2022