File size: 255 Bytes
120dee6
 
 
 
 
 
 
1
2
3
4
5
6
7
Implements the flop tokenizer, a sub-word tokenizer for autoregressive language modeling.


TODO:
    - Better printing during encoding of file and loading / exporting?
    - Include Python script for BPE training
    - Add time to logging during encoding