winglian's picture
optimize the iteration when tokenizeing large datasets (#332)
fe28543 unverified