training an e-diffi reproduction with ssd-1b
hello, i added support for ssd-1b into simpletuner and then added additional functionality to split the training between two models that use separate timestep ranges.
during that time i attempted to re-parameterise the weights of ssd-1b into v-prediction using a zero-terminal snr noise schedule.
it worked within 800 steps or so, which i didn't expect. this model trains very quickly. the last time i attempted a model reparameterisation it was SDXL 0.9 and it would always run into a mode collapse.
the lightweight nature of this model is greatly appreciated, an A6000 can meaningfully train this, instead of having to rent or purchase A100.
some examples of the Segmind e-Diffi combined with my foundational model "Terminus" as the stage 1. these weights are still undertrained.
an example of how to reproduce the results is here: https://github.com/bghira/SimpleTuner/blob/main/documentation/MIXTURE_OF_EXPERTS.md
Great work!