One Shot, One Talk: Whole-body Talking Avatar from a Single Image
Abstract
Building realistic and animatable avatars still requires minutes of multi-view or monocular self-rotating videos, and most methods lack precise control over gestures and expressions. To push this boundary, we address the challenge of constructing a whole-body talking avatar from a single image. We propose a novel pipeline that tackles two critical issues: 1) complex dynamic modeling and 2) generalization to novel gestures and expressions. To achieve seamless generalization, we leverage recent pose-guided image-to-video diffusion models to generate imperfect video frames as pseudo-labels. To overcome the dynamic modeling challenge posed by inconsistent and noisy pseudo-videos, we introduce a tightly coupled 3DGS-mesh hybrid avatar representation and apply several key regularizations to mitigate inconsistencies caused by imperfect labels. Extensive experiments on diverse subjects demonstrate that our method enables the creation of a photorealistic, precisely animatable, and expressive whole-body talking avatar from just a single image.
Community
One Shot, One Talk: Whole-body Talking Avatar from a Single Image
It enables the expressive 3D talking avatar reconstruction from a one-shot image (e.g., your favorite photo), which fully inherits the identity and supports realistic animation, including vivid body gestures and natural expression changes.
Project page: https://ustc3dv.github.io/OneShotOneTalk/
Arxiv: https://arxiv.org/abs/2412.01106
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- AniGS: Animatable Gaussian Avatar from a Single Image with Inconsistent Gaussian Reconstruction (2024)
- ConsistentAvatar: Learning to Diffuse Fully Consistent Talking Head Avatar with Temporal Guidance (2024)
- SOAR: Self-Occluded Avatar Recovery from a Single Video In the Wild (2024)
- GaussianSpeech: Audio-Driven Gaussian Avatars (2024)
- Gaussians-to-Life: Text-Driven Animation of 3D Gaussian Splatting Scenes (2024)
- SAGA: Surface-Aligned Gaussian Avatar (2024)
- Stereo-Talker: Audio-driven 3D Human Synthesis with Prior-Guided Mixture-of-Experts (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper