feat(dataset): add config to keep processed dataset in memory (#1152) 3db5f2f unverified Nanobit commited on Jan 20
update table for rwkv4 support, fix process count for dataset (#822) cdc71f7 unverified winglian commited on Nov 5, 2023
Attention mask and position id fixes for packing (#285) 2bb0b78 unverified winglian commited on Aug 12, 2023
Fixed pre-commit problems, fixed small bug in logging_config to handle LOG_LEVEL env var b1f4f7a theobjectivedad commited on Jul 15, 2023
add new sharegpt, refactor prompt so it can be customized later, add exception if no data is processed aac4b76 winglian commited on Jun 11, 2023
fix packing so that concatenated sequences reset the attention 9b8585d winglian commited on May 31, 2023
casts the prepared data to int16 (doesn't help with training memory) 2db9436 winglian commited on Apr 18, 2023