NoPo4D: No Pose, No Problem in 4D
Feed-Forward Dynamic 4D Gaussian Splatting from Unposed Multi-View Videos
This work presents NoPo4D, the first feed-forward system that jointly addresses dynamic content, multi-view input, and unknown camera poses in a single pass. In pursuit of pose-free 4D reconstruction, NoPo4D yields two key insights:
- π A decomposed velocity representation splits Gaussian motion into per-pixel image-plane shifts and depth changes. This allows direct supervision from 2D optical flow, obviating the need for complex 3D motion ground truth or differentiable rendering.
- β¨ A bidirectional motion encoder paired with view-dependent opacity effectively aggregates cross-view features and mitigates cross-timestep Gaussian misalignments.
π NoPo4D consistently outperforms prior feed-forward baselines across four multi-view dynamic benchmarks (ExoRecon, Immersive Light Field, Kubric, and N3DV). With an optional post-optimization stage, it surpasses per-scene optimization methods while running orders of magnitude faster.
Quick Start
For full installation instructions (including the custom Depth Anything 3 backbone and optional torch-scatter), please refer to the NoPo4D GitHub repository.
Once the dependencies are installed, you can use the model directly from the Hugging Face Hub via our Python API:
import torch
from src.model.nopo4d import NoPo4D
# Load pretrained model from Hugging Face
model = NoPo4D.from_pretrained("bralani01/nopo4d")
model = model.to("cuda").eval()
# Define your inputs
# images: (B, V, 3, H, W) β camera-major order
# V = num_cameras * num_frames
# e.g. 2 cameras x 3 frames -> [cam0_t0, cam0_t1, cam0_t2, cam1_t0, cam1_t1, cam1_t2]
# timestamps: (B, V) in [0, 1] β same layout as images; pass None for static scenes
# Run the Encoder
encoder_output = model(
images=images,
timestamps=timestamps,
num_cameras=num_cameras,
)
# encoder_output.gaussians β 4D Gaussian primitives
# encoder_output.camera_pose β predicted extrinsics / intrinsics
# encoder_output.depth β per-view depth maps
# encoder_output.optical_flow β per-view forward / backward flow
# Render Novel Views
render_output = model.render(
gaussians=encoder_output.gaussians,
extrinsics=target_extrinsics, # (B, V, 4, 4) c2w matrices
intrinsics=target_intrinsics, # (B, V, 3, 3) normalised intrinsics
image_shape=(H, W),
timestamps=target_timestamps, # (B, V) or None
)
# render_output.color: (B, V, 3, H, W)
# render_output.depth: (B, V, H, W)
Citation
@article{nopo4d,
title={NoPo4D: No Pose, No Problem in 4D},
author={TODO},
journal={arXiv preprint arXiv:TODO},
year={2026}
}
License
The code and models are licensed under the MIT License.
Acknowledgement
We thank the authors of these excellent works:
- Depth Anything 3 β backbone ViT
- gsplat β CUDA Gaussian splatting backend
- AnySplat β feed-forward Gaussian splatting framework
- Downloads last month
- 102