Commit History

feat(train): no batch dimension with pjit
df1fe19

boris commited on

feat(train): progress on pjit
49597a2

boris commited on

feat(train): start pjit support
0081723

boris commited on

Load from wandb artifact (#121)
f69b21b
unverified

boris commited on

Use DalleBartTokenizer. State restoration reverted to previous method:
ae983d7

Pedro Cuenca commited on

fix(train): variable not defined
4c87adf

boris commited on

feat(train): cleanup args
a2bf605

boris commited on

refactor(train): cleanup
274ba73

boris commited on

feat: custom gradient accumulation
2d07559

boris commited on

fix: style
df01fa8

boris commited on

feat(train): use MultiSteps for gradient accumulation
4fa53a5

boris commited on

Accept changes suggested by linter.
9f522b8

Pedro Cuenca commited on

Update help string for `model_name_or_path`.
290e443

Pedro Cuenca commited on

Update `resume_from_checkpoint` to use `from_pretrained`.
bb3f53e

Pedro Cuenca commited on

Load tokenizer associated to the model checkpoint, if possible.
a77c0d4

Pedro Cuenca commited on

Use model configuration unless a specific one is supplied.
5ec61cc

Pedro Cuenca commited on

fix: style
25862e8

boris commited on

feat: add more config of distributed_shampoo
89cf9ea

boris commited on

feat(train): refactor learning rate params
e2781bc

boris commited on

fix(train): handle seed_dataset
8b72ed8

boris commited on

feat: refactor TrainingArguments
adbdff9

boris commited on

fix: push_to_hub deprecated
23389f6

boris commited on

style: isort
531cd78

boris commited on

feat: add best_effort_memory_usage_reduction
4d518c7

boris commited on

fix: weight decay Adam + speed logging
7143593

boris commited on

fix: shampoo -> distributed shampoo
edae62d

boris commited on

feat: update params
604a65d

boris commited on

feat: add shampoo optimizer
0b87452

boris commited on

feat: allow abstract_init
772415c

boris commited on

fix: typo
5c84978

boris commited on

feat: log more metrics
1b757dc

boris commited on

feat: shard by host is optional
901ff72

boris commited on

feat: load data first
fdf7698

boris commited on

feat: display local TPU's
15993e3

boris commited on

fix: check local TPU instances only
87fed1b

boris commited on

feat(train): handle multi-hosts
5b533b5

boris commited on

style
a6252c9

boris commited on

feat: minor improvements
53dade7

boris commited on

fix: update model name
61c93f2

boris commited on

fix(train): update model name
b257ca8

boris commited on

feat: log num_params
1f57ad7

boris commited on

fix: adjust training script + dataloader
a96f4dc

boris commited on

style: use isort
d209547

boris commited on

feat(train): merge logged dict
baa52db

boris commited on

chore: move files
46f7469

boris commited on