Abstract
We introduce an approach for detecting and tracking detailed 3D poses of multiple people from a single monocular camera stream. Our system maintains temporally coherent predictions in crowded scenes filled with difficult poses and occlusions. Our model performs both strong per-frame detection and a learned pose update to track people from frame to frame. Rather than match detections across time, poses are updated directly from a new input image, which enables online tracking through occlusion. We train on numerous image and video datasets leveraging pseudo-labeled annotations to produce a model that matches state-of-the-art systems in 3D pose estimation accuracy while being faster and more accurate in tracking multiple people through time. Code and weights are provided at https://github.com/apple/ml-comotion
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- STEP: Simultaneous Tracking and Estimation of Pose for Animals and Humans (2025)
- HumanMM: Global Human Motion Recovery from Multi-shot Videos (2025)
- Tracktention: Leveraging Point Tracking to Attend Videos Faster and Better (2025)
- AnyCam: Learning to Recover Camera Poses and Intrinsics from Casual Videos (2025)
- FlashDepth: Real-time Streaming Video Depth Estimation at 2K Resolution (2025)
- Attention-Aware Multi-View Pedestrian Tracking (2025)
- Enhancing 3D Gaze Estimation in the Wild using Weak Supervision with Gaze Following Labels (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper