Improve model card: add metadata, abstract, and setup instructions (#1)
Browse files- Improve model card: add metadata, abstract, and setup instructions (b03f31b63a5f679292e91baededdb5250fc9353a)
Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>
README.md
CHANGED
|
@@ -1,3 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
<div align="center">
|
| 2 |
|
| 3 |
# Pulp Motion: Framing-aware multimodal camera and human motion generation
|
|
@@ -16,6 +22,10 @@
|
|
| 16 |
|
| 17 |
</div>
|
| 18 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 19 |
|
| 20 |
<div align="center">
|
| 21 |
<a href="https://www.lix.polytechnique.fr/vista/projects/2025_pulpmotion_courant/" class="button"><b>[Webpage]</b></a>
|
|
@@ -43,4 +53,4 @@ Prepare the dataset (untar archives):
|
|
| 43 |
```
|
| 44 |
cd pulpmotion-models
|
| 45 |
sh download_smpl
|
| 46 |
-
```
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
pipeline_tag: text-to-video
|
| 4 |
+
library_name: diffusers
|
| 5 |
+
---
|
| 6 |
+
|
| 7 |
<div align="center">
|
| 8 |
|
| 9 |
# Pulp Motion: Framing-aware multimodal camera and human motion generation
|
|
|
|
| 22 |
|
| 23 |
</div>
|
| 24 |
|
| 25 |
+
This model was presented in the paper [Pulp Motion: Framing-aware multimodal camera and human motion generation](https://huggingface.co/papers/2510.05097).
|
| 26 |
+
|
| 27 |
+
## Abstract
|
| 28 |
+
Treating human motion and camera trajectory generation separately overlooks a core principle of cinematography: the tight interplay between actor performance and camera work in the screen space. In this paper, we are the first to cast this task as a text-conditioned joint generation, aiming to maintain consistent on-screen framing while producing two heterogeneous, yet intrinsically linked, modalities: human motion and camera trajectories. We propose a simple, model-agnostic framework that enforces multimodal coherence via an auxiliary modality: the on-screen framing induced by projecting human joints onto the camera. This on-screen framing provides a natural and effective bridge between modalities, promoting consistency and leading to more precise joint distribution. We first design a joint autoencoder that learns a shared latent space, together with a lightweight linear transform from the human and camera latents to a framing latent. We then introduce auxiliary sampling, which exploits this linear transform to steer generation toward a coherent framing modality. To support this task, we also introduce the PulpMotion dataset, a human-motion and camera-trajectory dataset with rich captions, and high-quality human motions. Extensive experiments across DiT- and MAR-based architectures show the generality and effectiveness of our method in generating on-frame coherent human-camera motions, while also achieving gains on textual alignment for both modalities. Our qualitative results yield more cinematographically meaningful framings setting the new state of the art for this task.
|
| 29 |
|
| 30 |
<div align="center">
|
| 31 |
<a href="https://www.lix.polytechnique.fr/vista/projects/2025_pulpmotion_courant/" class="button"><b>[Webpage]</b></a>
|
|
|
|
| 53 |
```
|
| 54 |
cd pulpmotion-models
|
| 55 |
sh download_smpl
|
| 56 |
+
```
|