tokenizer / README.md
flopml's picture
tracking with lfs
120dee6

Implements the flop tokenizer, a sub-word tokenizer for autoregressive language modeling.

TODO: - Better printing during encoding of file and loading / exporting? - Include Python script for BPE training - Add time to logging during encoding