winglian's picture
remove columns after tokenizing for pretraining (#571)
1157950 unverified