Eungbean Lee's picture
7

Eungbean Lee

Eungbean
Β·

AI & ML interests

Computer Vision

Recent Activity

liked a Space 26 days ago
gokaygokay/Flux-TRELLIS
liked a model 29 days ago
ostris/ip-composition-adapter
liked a model 29 days ago
ostris/OpenFLUX.1
View all activity

Organizations

Digital Image and Media Lab's profile picture

Eungbean's activity

reacted to multimodalart's post with πŸ‘ 11 months ago
view post
Post
The Stable Diffusion 3 research paper broken down, including some overlooked details! πŸ“

Model
πŸ“ 2 base model variants mentioned: 2B and 8B sizes

πŸ“ New architecture in all abstraction levels:
- πŸ”½ UNet; ⬆️ Multimodal Diffusion Transformer, bye cross attention πŸ‘‹
- πŸ†• Rectified flows for the diffusion process
- 🧩 Still a Latent Diffusion Model

πŸ“„ 3 text-encoders: 2 CLIPs, one T5-XXL; plug-and-play: removing the larger one maintains competitiveness

πŸ—ƒοΈ Dataset was deduplicated with SSCD which helped with memorization (no more details about the dataset tho)

Variants
πŸ” A DPO fine-tuned model showed great improvement in prompt understanding and aesthetics
✏️ An Instruct Edit 2B model was trained, and learned how to do text-replacement

Results
βœ… State of the art in automated evals for composition and prompt understanding
βœ… Best win rate in human preference evaluation for prompt understanding, aesthetics and typography (missing some details on how many participants and the design of the experiment)

Paper: https://stabilityai-public-packages.s3.us-west-2.amazonaws.com/Stable+Diffusion+3+Paper.pdf
Β·