Mark regions in images based on text descriptions
The massive multimodal embedding benchmark
Generate images from text prompts with a specific style
Train Free Personaliz° Diff w/ Stochastic Optimal Control
Generate images from text prompts
4M: Massively Multimodal Masked Modeling