[x] generate tokens [ ] generate vocab [ ] train a 70m model [ ] train a 3b model refer https://github.com/alasdairforsythe/tokenmonster/tree/main/training