Papers
arxiv:2405.20340

MotionLLM: Understanding Human Behaviors from Human Motions and Videos

Published on May 30
ยท Submitted by akhaliq on May 31
#3 Paper of the day
Authors:
,
,

Abstract

This study delves into the realm of multi-modality (i.e., video and motion modalities) human behavior understanding by leveraging the powerful capabilities of Large Language Models (LLMs). Diverging from recent LLMs designed for video-only or motion-only understanding, we argue that understanding human behavior necessitates joint modeling from both videos and motion sequences (e.g., SMPL sequences) to capture nuanced body part dynamics and semantics effectively. In light of this, we present MotionLLM, a straightforward yet effective framework for human motion understanding, captioning, and reasoning. Specifically, MotionLLM adopts a unified video-motion training strategy that leverages the complementary advantages of existing coarse video-text data and fine-grained motion-text data to glean rich spatial-temporal insights. Furthermore, we collect a substantial dataset, MoVid, comprising diverse videos, motions, captions, and instructions. Additionally, we propose the MoVid-Bench, with carefully manual annotations, for better evaluation of human behavior understanding on video and motion. Extensive experiments show the superiority of MotionLLM in the caption, spatial-temporal comprehension, and reasoning ability.

Community

hey @EvanTHU would you like host the model in a model repo and host a demo for it on Spaces? we can provide you compute grant (free A100) ๐Ÿค—

ยท
Paper author

Some guys from Huggingface have reached out to me. I am working on this.

This comment has been hidden
This comment has been hidden
This comment has been hidden
This comment has been hidden
This comment has been hidden

How MotionLLM is Revolutionizing Human Behavior Understanding!

๐Ÿ‘‰ Subscribe: https://www.youtube.com/@Arxflix
๐Ÿ‘‰ Twitter: https://x.com/arxflix
๐Ÿ‘‰ LMNT (Partner): https://lmnt.com/

By Arxflix
9t4iCUHx_400x400-1.jpg

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2405.20340 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2405.20340 in a dataset README.md to link it from this page.

Spaces citing this paper 1

Collections including this paper 7