Spaces:
Running
Running
Commit History
feat(train): log norm and histograms (#143)
b7b619a
unverified
feat(data): super conditioning (#141)
7939874
unverified
feat: support pod (#139)
803ccbf
unverified
fix: no gradient checkpointing for new model
2e02683
feat: no gradient checkpointing for params init
b798ed3
fix(train): consider schedule offset
bc4734f
feat(train): local jax cache
9f5e879
feat: add bucket reference to artifact
d368fb6
style: lint
d5d442a
feat: handle gradient checkpointing
5173ec7
feat: load from bucket
1c4e839
feat(train): save to bucket
50498e6
feat: reduce artifact space + offset step
34cf91c
feat: restore weights on CPU
5f954fc
feat(train): simplify tokenizer loading
4cb21dd
feat(train): use compilation cache
da9367c
feat: log num_parameters early
7cfe576
feat(train) - handle multiple nodes (#130)
0952927
unverified
feat: handle model parallel
1bb3269
feat(train): more custom x-axis
5f28cd2
fix(train): opt_state_shape for distributed_shampoo
225b6ff
feat(train): split artifact into model/state
fa5b058
feat(train): another 25% faster
14abe8c
feat(train): overhead from 70% to 1% 🥳
2b7f5f1
feat(pjit): follow t5x style
7b5868f
fix(train): grads spec
00710bc
feat(train): improve pjit speed
f254058
fix(train): consider correct batch size
b7c7458
feat(train): custom start_preconditioning_step
8149924
feat(train): handle distributed_shampoo in pjit
032f623
feat(train): distributed_shampoo with pjit
cc34d07
fix style
f044cb8
feat(train): restore opt_state efficiently
1bfc1b5
feat(model): clean way to load on cpu
12f323d
feat(train): load model on CPU
3d43591
feat(train): different rng per node
2d212d8
feat(train): no batch dimension with pjit
df1fe19
feat(train): progress on pjit
49597a2
feat(train): start pjit support
0081723
Load from wandb artifact (#121)
f69b21b
unverified
Use DalleBartTokenizer. State restoration reverted to previous method:
ae983d7
Pedro Cuenca
commited on
fix(train): variable not defined
4c87adf
feat(train): cleanup args
a2bf605
refactor(train): cleanup
274ba73
feat: custom gradient accumulation
2d07559
fix: style
df01fa8
feat(train): use MultiSteps for gradient accumulation
4fa53a5
Accept changes suggested by linter.
9f522b8
Pedro Cuenca
commited on
Update help string for `model_name_or_path`.
290e443
Pedro Cuenca
commited on