training using pytorch native 10 epoch, batch size 8, block size 512,lr 1e-4 cosine 6f190f4 verified finnstrom3693 commited on May 7