On the 1to2 experiment
As far as I understand, what you try to do in the 1to2 experiment has been explored in this paper https://arxiv.org/abs/2303.11305
Their regularization on the cross attention part seems to be important but implementing it would need further modification of the training code
Thank you very much for the information. Guess I should've spent more effort on literature review. "Cutmix stable diffusion" was a poor search keyword.
It seems like they manually created the CutMixes. This doesn't scale for many characters, at least not trivially.
The paper has some interesting ideas such as only updating the singular values. I'll look deeper into it when I have time.
By the way, it's surprising that you found this here first instead of on Civitai, as I didn't post links to this anywhere.
Indeed there are two parts of the paper. The first part is more like another variant of LoRa and as all the existing variants I think it is hard to judge what its advantage and disadvantage would be.
The second part is mostly the thing you are trying here as far as I understand. The trick on cross attention seems to be relevant, but it probably does not allow to go over two characters or have complex interaction. Anyway these things are fundamentally challenging for SD so I guess some more important work would be needed to achieve that.
And I found this first because I came from the IDMS model :)
Btw are you interested in joining the MyneFactory discord https://discord.gg/mvSGg3xU ?