Questions about “Minibatch Optimal Transport”
#10
by
zyx1213271098
- opened
I don't understand the principle of Minibatch Optimal Transport. Can you explain it in more detail? Why is a smaller distance more advantageous for model training? What impact does this have on the inference performance?
yeah basically for every training batch you compute the optimal transport pairings between noise and image.
it has faster convergence because the model has more certainty when regressing on the vector to learn the expectation value
you can see here, mnist for just 1 epoch almost converged
https://x.com/LodestoneE621/status/1893408571448049685
you can see the majority of the flow path is straighter too
so it can reduce inference steps quite a bit (not as significant as reflowing it again tho)