Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
267.6
TFLOPS
2
3
Xiangyi Li
xdotli
Follow
0 followers
ยท
4 following
https://www.xiangyi.li
xdotli
xdotli
AI & ML interests
None yet
Recent Activity
liked
a Space
2 days ago
ServiceNow/browsergym-leaderboard
reacted
to
KaiChen1998
's
post
with ๐
5 days ago
๐ข Our EMOVA paper has been accepted by CVPR 2025, and we are glad to release all resources, including code (training & inference), datasets (training & evaluation), and checkpoints (EMOVA-3B/7B/72B)! ๐ค EMOVA is a novel end-to-end omni-modal LLM that can see, hear and speak. Given omni-modal (i.e., textual, visual and speech) inputs, EMOVA can generate both textual and speech responses with vivid emotional controls by utilizing the speech decoder and a style controller. โจ EMOVA Highlights โ State-of-the-art omni-modality: EMOVA achieves SoTA comparable results on both vision-language and speech benchmarks simultaneously. โ Device adaptation: our codebase supports training/inference on both NVIDIA GPUs (e.g., A800 & H20) and Ascend NPUs (e.g., 910B3)! โ Modular design: we integrate multiple implementations of vision encoder, vision projector, and language model, even including the most recent DeepSeekMoE-tiny! ๐ฅ You are all welcome to try and star! - Project page: https://emova-ollm.github.io/ - Github: https://github.com/emova-ollm/EMOVA - Demo: https://huggingface.co/spaces/Emova-ollm/EMOVA-demo
upvoted
a
paper
7 days ago
Multiple Choice Questions: Reasoning Makes Large Language Models (LLMs) More Self-Confident Even When They Are Wrong
View all activity
Organizations
xdotli
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
Articles
liked
a Space
2 days ago
Running
16
16
BrowserGym Leaderboard
๐
Display data interactively
liked
a dataset
12 days ago
upstage/dp-bench
Updated
Oct 24, 2024
โข
1.33k
โข
67
liked
a dataset
6 months ago
dair-ai/emotion
Viewer
โข
Updated
Aug 8, 2024
โข
437k
โข
14.1k
โข
332