qwerrwe / src /axolotl /datasets.py

Commit History

Pretrain transforms (#1261)
c7cf381
unverified

winglian commited on

feat(dataset): add config to keep processed dataset in memory (#1152)
3db5f2f
unverified

Nanobit commited on

Preprocess dataset size fix (#1131)
7570446
unverified

winglian commited on

update table for rwkv4 support, fix process count for dataset (#822)
cdc71f7
unverified

winglian commited on

Correct typos in datasets.py (#639)
d1236f2
unverified

felixonmars commited on

split completion text to sequence_len (#616)
97d3776
unverified

winglian commited on

Attention mask and position id fixes for packing (#285)
2bb0b78
unverified

winglian commited on

feat: use multi-core
45ac7c4

Nanobit commited on

Fixed pre-commit problems, fixed small bug in logging_config to handle LOG_LEVEL env var
b1f4f7a

theobjectivedad commited on

pylint for duplicated code for system prompts
7b57ed7

winglian commited on

add new sharegpt, refactor prompt so it can be customized later, add exception if no data is processed
aac4b76

winglian commited on

fix packing so that concatenated sequences reset the attention
9b8585d

winglian commited on

Apply isort then black
37293dc

Nanobit commited on

Lint datasets
6abb7f6

Nanobit commited on

Lint and format
392dfd9

Nanobit commited on

fix new dataset prompt tokenizers
0f74464

winglian commited on

black formatting
2bc1a5b

winglian commited on

various bugfixes
94f5e41

winglian commited on

casts the prepared data to int16 (doesn't help with training memory)
2db9436

winglian commited on

4bit quantized support (wip)
77fca25

winglian commited on

various bugfixes
80b2ed2

winglian commited on

black formatting
a6028d3

winglian commited on

make it work with pythia in the cloud
8d959a7

winglian commited on

WIP for axolotl trainer
ce24f5e

winglian commited on