NoPo4D: No Pose, No Problem in 4D

Feed-Forward Dynamic 4D Gaussian Splatting from Unposed Multi-View Videos

Project Website Paper GitHub Repo

This work presents NoPo4D, the first feed-forward system that jointly addresses dynamic content, multi-view input, and unknown camera poses in a single pass. In pursuit of pose-free 4D reconstruction, NoPo4D yields two key insights:

  • πŸ’Ž A decomposed velocity representation splits Gaussian motion into per-pixel image-plane shifts and depth changes. This allows direct supervision from 2D optical flow, obviating the need for complex 3D motion ground truth or differentiable rendering.
  • ✨ A bidirectional motion encoder paired with view-dependent opacity effectively aggregates cross-view features and mitigates cross-timestep Gaussian misalignments.

πŸ† NoPo4D consistently outperforms prior feed-forward baselines across four multi-view dynamic benchmarks (ExoRecon, Immersive Light Field, Kubric, and N3DV). With an optional post-optimization stage, it surpasses per-scene optimization methods while running orders of magnitude faster.

Quick Start

For full installation instructions (including the custom Depth Anything 3 backbone and optional torch-scatter), please refer to the NoPo4D GitHub repository.

Once the dependencies are installed, you can use the model directly from the Hugging Face Hub via our Python API:

import torch
from src.model.nopo4d import NoPo4D

# Load pretrained model from Hugging Face
model = NoPo4D.from_pretrained("bralani01/nopo4d")
model = model.to("cuda").eval()

# Define your inputs
# images:     (B, V, 3, H, W)  β€” camera-major order
#             V = num_cameras * num_frames
#             e.g. 2 cameras x 3 frames -> [cam0_t0, cam0_t1, cam0_t2, cam1_t0, cam1_t1, cam1_t2]
# timestamps: (B, V) in [0, 1] β€” same layout as images; pass None for static scenes

# Run the Encoder
encoder_output = model(
    images=images,
    timestamps=timestamps,
    num_cameras=num_cameras,
)
# encoder_output.gaussians     β€” 4D Gaussian primitives
# encoder_output.camera_pose   β€” predicted extrinsics / intrinsics
# encoder_output.depth         β€” per-view depth maps
# encoder_output.optical_flow  β€” per-view forward / backward flow

# Render Novel Views
render_output = model.render(
    gaussians=encoder_output.gaussians,
    extrinsics=target_extrinsics,    # (B, V, 4, 4)  c2w matrices
    intrinsics=target_intrinsics,    # (B, V, 3, 3)  normalised intrinsics
    image_shape=(H, W),
    timestamps=target_timestamps,    # (B, V) or None
)
# render_output.color: (B, V, 3, H, W)
# render_output.depth: (B, V, H, W)

Citation

@article{nopo4d,
  title={NoPo4D: No Pose, No Problem in 4D},
  author={TODO},
  journal={arXiv preprint arXiv:TODO},
  year={2026}
}

License

The code and models are licensed under the MIT License.

Acknowledgement

We thank the authors of these excellent works:

Downloads last month
102
Safetensors
Model size
0.7B params
Tensor type
F32
Β·
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support