-
Matryoshka Diffusion Models
Paper โข 2310.15111 โข Published โข 42 -
AToM: Amortized Text-to-Mesh using 2D Diffusion
Paper โข 2402.00867 โข Published โข 11 -
Neural Network Diffusion
Paper โข 2402.13144 โข Published โข 95 -
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Paper โข 2402.19479 โข Published โข 34
mexicanamerican PRO
mexicanamerican
AI & ML interests
None yet
Recent Activity
liked
a model
about 18 hours ago
bartowski/BeaverAI_MN-2407-DSK-QwQify-v0.1-12B-GGUF
reacted
to
KaiChen1998's
post
with ๐
about 18 hours ago
๐ข Our EMOVA paper has been accepted by CVPR 2025, and we are glad to release all resources, including code (training & inference), datasets (training & evaluation), and checkpoints (EMOVA-3B/7B/72B)!
๐ค EMOVA is a novel end-to-end omni-modal LLM that can see, hear and speak. Given omni-modal (i.e., textual, visual and speech) inputs, EMOVA can generate both textual and speech responses with vivid emotional controls by utilizing the speech decoder and a style controller.
โจ EMOVA Highlights
โ
State-of-the-art omni-modality: EMOVA achieves SoTA comparable results on both vision-language and speech benchmarks simultaneously.
โ
Device adaptation: our codebase supports training/inference on both NVIDIA GPUs (e.g., A800 & H20) and Ascend NPUs (e.g., 910B3)!
โ
Modular design: we integrate multiple implementations of vision encoder, vision projector, and language model, even including the most recent DeepSeekMoE-tiny!
๐ฅ You are all welcome to try and star!
- Project page: https://emova-ollm.github.io/
- Github: https://github.com/emova-ollm/EMOVA
- Demo: https://huggingface.co/spaces/Emova-ollm/EMOVA-demo
updated
a model
about 19 hours ago
mexicanamerican/DeepSeek-R1-1.5B-Medical-COT-Q8_0-GGUF
Organizations
Collections
6
spaces
2
models
3
datasets
None public yet