Llamas vs Capybaras

AI & ML interests

Capybaras actually win

Recent Activity

llama-vs-capybara's activity

multimodalartΒ 
posted an update 5 months ago
multimodalartΒ 
posted an update 7 months ago
view post
Post
24842
The first open Stable Diffusion 3-like architecture model is JUST out πŸ’£ - but it is not SD3! πŸ€”

It is Tencent-Hunyuan/HunyuanDiT by Tencent, a 1.5B parameter DiT (diffusion transformer) text-to-image model πŸ–ΌοΈβœ¨, trained with multi-lingual CLIP + multi-lingual T5 text-encoders for english 🀝 chinese understanding

Try it out by yourself here ▢️ https://huggingface.co/spaces/multimodalart/HunyuanDiT
(a bit too slow as the model is chunky and the research code isn't super optimized for inference speed yet)

In the paper they claim to be SOTA open source based on human preference evaluation!
multimodalartΒ 
posted an update 10 months ago
view post
Post
The Stable Diffusion 3 research paper broken down, including some overlooked details! πŸ“

Model
πŸ“ 2 base model variants mentioned: 2B and 8B sizes

πŸ“ New architecture in all abstraction levels:
- πŸ”½ UNet; ⬆️ Multimodal Diffusion Transformer, bye cross attention πŸ‘‹
- πŸ†• Rectified flows for the diffusion process
- 🧩 Still a Latent Diffusion Model

πŸ“„ 3 text-encoders: 2 CLIPs, one T5-XXL; plug-and-play: removing the larger one maintains competitiveness

πŸ—ƒοΈ Dataset was deduplicated with SSCD which helped with memorization (no more details about the dataset tho)

Variants
πŸ” A DPO fine-tuned model showed great improvement in prompt understanding and aesthetics
✏️ An Instruct Edit 2B model was trained, and learned how to do text-replacement

Results
βœ… State of the art in automated evals for composition and prompt understanding
βœ… Best win rate in human preference evaluation for prompt understanding, aesthetics and typography (missing some details on how many participants and the design of the experiment)

Paper: https://stabilityai-public-packages.s3.us-west-2.amazonaws.com/Stable+Diffusion+3+Paper.pdf
Β·
multimodalartΒ 
posted an update 10 months ago
multimodalartΒ 
posted an update 11 months ago
view post
Post
It seems February started with a fully open source AI renaissance 🌟

Models released with fully open dataset, training code, weights βœ…

LLM - allenai/olmo-suite-65aeaae8fe5b6b2122b46778 🧠
Embedding - nomic-ai/nomic-embed-text-v1 πŸ“š (sota!)

And it's literally February 1st - can't wait to see what else the community will bring πŸ‘€