training using pytorch native 3 epoch, batch size 14, block size 512,lr 1e-4 cosine 8d45360 verified finnstrom3693 commited on May 3