@rwightman on Hugging Face: "There's a new `timm` release, v 1.0.12, with a focus on optimizers. The…"

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

rwightman

posted an update Dec 3, 2024

Post

1449

There's a new timm release, v 1.0.12, with a focus on optimizers. The optimizer factory has been refactored, there's now a timm.optim.list_optimizers() and new way to register optimizers and their attributes. As always you can use an timm optimizer like a torch one, just replace torch.optim with timm.optim

New optimizers include:
* AdafactorBigVision - adfactorbv
* ADOPT - adopt / adoptw (decoupled decay)
* MARS - mars
* LaProp - laprop
* Cautious Optimizers - a modification to all of the above, prefix with c as well as cadamw, cnadamw, csgdw, clamb, crmsproptf

I shared some caution comparisons in this model repo: rwightman/timm-optim-caution

For details, references, see the code: https://github.com/huggingface/pytorch-image-models/tree/main/timm/optim

mrdbourke

Dec 3, 2024

Woah, looks like a good boost across most results. Been using torch.optim.adamw for months. Will try out a training run today with timm.optim.cadamw

rwightman

Dec 4, 2024

Yeah, it's been working out well in runs so far, but as is often the case with new optimizers or optimizer enhancements milage can vary depending on many variables, curious to know how it works for your case. Case in point I had some great fine-tune results with adopt, but in this mini-imagenet case it rather flopped. But MARS, is actually doing really well here, and MARS w/ caution even better so it's very hard to cover all ground with new optimizers. MARS results to be added soon though

In this post