training using pytorch native 5 epoch, batch size 8, block size 512,lr 1e-4 cosine e20c6ef verified finnstrom3693 commited on May 4
training using pytorch native 3 epoch, batch size 14, block size 512,lr 1e-4 cosine 8d45360 verified finnstrom3693 commited on May 3