add new checkpoint trained for a hundred steps with smaller max grad norm and weight decay
7a20e92
pszemraj
commited on