Commit History

feat(train): handle distributed_shampoo in pjit
032f623

boris commited on

feat: update distributed_shampoo + fix None spec
8a9e367

boris commited on

feat(train): distributed_shampoo with pjit
cc34d07

boris commited on

feat(train): use pjit (#125)
f5239e1
unverified

boris commited on

style: unsused import
7a176b9

boris commited on

fix style
f044cb8

boris commited on

feat(train): restore opt_state efficiently
1bfc1b5

boris commited on

feat(model): clean way to load on cpu
12f323d

boris commited on

feat(train): load model on CPU
3d43591

boris commited on

feat(train): different rng per node
2d212d8

boris commited on

feat(train): no batch dimension with pjit
df1fe19

boris commited on

feat(train): progress on pjit
49597a2

boris commited on

feat(train): start pjit support
0081723

boris commited on

feat: use_artifact if run existing
a5ed112

boris commited on

Load from wandb artifact (#121)
f69b21b
unverified

boris commited on

Style (isort).
f9d51f7

Pedro Cuenca commited on

feat(train): update sweep config
bbbf7c8

boris commited on

Use DalleBartTokenizer. State restoration reverted to previous method:
ae983d7

Pedro Cuenca commited on

Tokenizer, config, model can be loaded from wandb.
7e48337

Pedro Cuenca commited on

fix(train): variable not defined
4c87adf

boris commited on

feat(train): cleanup args
a2bf605

boris commited on

Merge pull request #122 from borisdayma/feat-acccum
c91ceb7
unverified

boris commited on

feat(data): support accumulation in non-streaming
88c8e06

boris commited on

refactor(train): cleanup
274ba73

boris commited on

feat: custom gradient accumulation
2d07559

boris commited on

fix: style
df01fa8

boris commited on

feat(train): use MultiSteps for gradient accumulation
4fa53a5

boris commited on

Change import order again.
2b2be9b

Pedro Cuenca commited on

Fix import order to make isort happy.
64d99b2

Pedro Cuenca commited on

Accept changes suggested by linter.
9f522b8

Pedro Cuenca commited on

Update help string for `model_name_or_path`.
290e443

Pedro Cuenca commited on

Update `resume_from_checkpoint` to use `from_pretrained`.
bb3f53e

Pedro Cuenca commited on

Never consider local dirs as remote wandb references.
08dd098

Pedro Cuenca commited on

Load tokenizer associated to the model checkpoint, if possible.
a77c0d4

Pedro Cuenca commited on

Store resolved path after loading model.
55a631d

Pedro Cuenca commited on

Use model configuration unless a specific one is supplied.
5ec61cc

Pedro Cuenca commited on

Override from_pretrained to support wandb artifacts.
1023afa

Pedro Cuenca commited on

Merge pull request #118 from borisdayma/feat-optim
193c88c
unverified

boris commited on

fix: style
25862e8

boris commited on

feat: add more config of distributed_shampoo
89cf9ea

boris commited on

fix(data): no shuffling of validation data
ddcbc6a

boris commited on

feat(train): refactor learning rate params
e2781bc

boris commited on

fix(train): handle seed_dataset
8b72ed8

boris commited on

feat: refactor TrainingArguments
adbdff9

boris commited on

fix: push_to_hub deprecated
23389f6

boris commited on

feat: support pypi
f5dba1e

boris commited on

doc: update contributions
e3b1b56

boris commited on

Merge pull request #117 from borisdayma/fix-inference
ef985be
unverified

boris commited on

fix(inference): use float32 + flatten logits
71c4de3

boris commited on

Merge pull request #115 from borisdayma/feat-shampoo
3a3d375
unverified

boris commited on