It is a mix of of mixing and training focused on multiple people in realistic images. While it's not quite perfect at photorealism, and might struggle a bit with super complex scenes or hands, our focus was to make is possible to easily prompt for multiple people.
Does well with 'DPM++ 3M SDE Karras' at 1024x1024 for most people.
Works well with LORAs.