|
--- |
|
tags: |
|
- vision |
|
--- |
|
|
|
## DUSt3R |
|
|
|
# Model info |
|
|
|
Project page: https://dust3r.europe.naverlabs.com/ |
|
|
|
# How to use |
|
|
|
Here's how to load the model (after [installing](https://github.com/naver/dust3r?tab=readme-ov-file#installation) the dust3r package): |
|
|
|
```python |
|
from dust3r.model import AsymmetricCroCo3DStereo |
|
import torch |
|
|
|
model = AsymmetricCroCo3DStereo.from_pretrained("nielsr/DUSt3R_ViTLarge_BaseDecoder_512_dpt") |
|
|
|
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") |
|
model.to(device) |
|
``` |
|
|
|
Next, one can run inference as follows: |
|
|
|
``` |
|
from dust3r.inference import inference |
|
from dust3r.utils.image import load_images |
|
from dust3r.image_pairs import make_pairs |
|
from dust3r.cloud_opt import global_aligner, GlobalAlignerMode |
|
|
|
if __name__ == '__main__': |
|
batch_size = 1 |
|
schedule = 'cosine' |
|
lr = 0.01 |
|
niter = 300 |
|
|
|
# load_images can take a list of images or a directory |
|
images = load_images(['croco/assets/Chateau1.png', 'croco/assets/Chateau2.png'], size=512) |
|
pairs = make_pairs(images, scene_graph='complete', prefilter=None, symmetrize=True) |
|
output = inference(pairs, model, device, batch_size=batch_size) |
|
|
|
# at this stage, you have the raw dust3r predictions |
|
view1, pred1 = output['view1'], output['pred1'] |
|
view2, pred2 = output['view2'], output['pred2'] |
|
# here, view1, pred1, view2, pred2 are dicts of lists of len(2) |
|
# -> because we symmetrize we have (im1, im2) and (im2, im1) pairs |
|
# in each view you have: |
|
# an integer image identifier: view1['idx'] and view2['idx'] |
|
# the img: view1['img'] and view2['img'] |
|
# the image shape: view1['true_shape'] and view2['true_shape'] |
|
# an instance string output by the dataloader: view1['instance'] and view2['instance'] |
|
# pred1 and pred2 contains the confidence values: pred1['conf'] and pred2['conf'] |
|
# pred1 contains 3D points for view1['img'] in view1['img'] space: pred1['pts3d'] |
|
# pred2 contains 3D points for view2['img'] in view1['img'] space: pred2['pts3d_in_other_view'] |
|
|
|
# next we'll use the global_aligner to align the predictions |
|
# depending on your task, you may be fine with the raw output and not need it |
|
# with only two input images, you could use GlobalAlignerMode.PairViewer: it would just convert the output |
|
# if using GlobalAlignerMode.PairViewer, no need to run compute_global_alignment |
|
scene = global_aligner(output, device=device, mode=GlobalAlignerMode.PointCloudOptimizer) |
|
loss = scene.compute_global_alignment(init="mst", niter=niter, schedule=schedule, lr=lr) |
|
|
|
# retrieve useful values from scene: |
|
imgs = scene.imgs |
|
focals = scene.get_focals() |
|
poses = scene.get_im_poses() |
|
pts3d = scene.get_pts3d() |
|
confidence_masks = scene.get_masks() |
|
|
|
# visualize reconstruction |
|
scene.show() |
|
|
|
# find 2D-2D matches between the two images |
|
from dust3r.utils.geometry import find_reciprocal_matches, xy_grid |
|
pts2d_list, pts3d_list = [], [] |
|
for i in range(2): |
|
conf_i = confidence_masks[i].cpu().numpy() |
|
pts2d_list.append(xy_grid(*imgs[i].shape[:2][::-1])[conf_i]) # imgs[i].shape[:2] = (H, W) |
|
pts3d_list.append(pts3d[i].detach().cpu().numpy()[conf_i]) |
|
reciprocal_in_P2, nn2_in_P1, num_matches = find_reciprocal_matches(*pts3d_list) |
|
print(f'found {num_matches} matches') |
|
matches_im1 = pts2d_list[1][reciprocal_in_P2] |
|
matches_im0 = pts2d_list[0][nn2_in_P1][reciprocal_in_P2] |
|
|
|
# visualize a few matches |
|
import numpy as np |
|
from matplotlib import pyplot as pl |
|
n_viz = 10 |
|
match_idx_to_viz = np.round(np.linspace(0, num_matches-1, n_viz)).astype(int) |
|
viz_matches_im0, viz_matches_im1 = matches_im0[match_idx_to_viz], matches_im1[match_idx_to_viz] |
|
|
|
H0, W0, H1, W1 = *imgs[0].shape[:2], *imgs[1].shape[:2] |
|
img0 = np.pad(imgs[0], ((0, max(H1 - H0, 0)), (0, 0), (0, 0)), 'constant', constant_values=0) |
|
img1 = np.pad(imgs[1], ((0, max(H0 - H1, 0)), (0, 0), (0, 0)), 'constant', constant_values=0) |
|
img = np.concatenate((img0, img1), axis=1) |
|
pl.figure() |
|
pl.imshow(img) |
|
cmap = pl.get_cmap('jet') |
|
for i in range(n_viz): |
|
(x0, y0), (x1, y1) = viz_matches_im0[i].T, viz_matches_im1[i].T |
|
pl.plot([x0, x1 + W0], [y0, y1], '-+', color=cmap(i / (n_viz - 1)), scalex=False, scaley=False) |
|
pl.show(block=True) |
|
|
|
``` |
|
|
|
### BibTeX entry and citation info |
|
|
|
```bibtex |
|
@journal{dust3r2023, |
|
title={{DUSt3R: Geometric 3D Vision Made Easy}}, |
|
author={{Wang, Shuzhe and Leroy, Vincent and Cabon, Yohann and Chidlovskii, Boris and Revaud Jerome}}, |
|
journal={arXiv preprint 2312.14132}, |
|
year={2023}} |
|
``` |