120dee6
1
2
3
4
5
6
7
Implements the flop tokenizer, a sub-word tokenizer for autoregressive language modeling. TODO: - Better printing during encoding of file and loading / exporting? - Include Python script for BPE training - Add time to logging during encoding