Add/improve model card for LVSM
#1
by
nielsr
HF Staff
- opened
README.md
CHANGED
@@ -1,3 +1,61 @@
|
|
1 |
-
---
|
2 |
-
license: mit
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
pipeline_tag: image-to-image
|
4 |
+
library_name: pytorch
|
5 |
+
---
|
6 |
+
|
7 |
+
# LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias
|
8 |
+
|
9 |
+
This repository contains a re-implementation of the LVSM model described in [LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias](https://arxiv.org/abs/2410.17242). The provided checkpoints are from the original Adobe implementation.
|
10 |
+
|
11 |
+
Project Page: https://haian-jin.github.io/projects/LVSM/
|
12 |
+
|
13 |
+
## Checkpoints
|
14 |
+
|
15 |
+
The scene-level evaluation is conducted on the [RealEstate10K](http://schadenfreude.csail.mit.edu:8000/) dataset.
|
16 |
+
|
17 |
+
| Model | PSNR | SSIM | LPIPS |
|
18 |
+
|----------------------------------------------------------------------------|-----|-----|-----|
|
19 |
+
| [LVSM Decoder-Only Scene-Level res256 (full)](https://huggingface.co/coast01/LVSM/resolve/main/scene_decoder_only_256.pt?download=true) | 29.67 | 0.906 | 0.098 |
|
20 |
+
| [LVSM Encoder-Decoder Scene-Level res256 (full)](https://huggingface.co/coast01/LVSM/resolve/main/scene_encoder_decoder_256.pt?download=true) | 28.60 | 0.893 | 0.114 |
|
21 |
+
|
22 |
+
|
23 |
+
## How to use
|
24 |
+
|
25 |
+
This example demonstrates loading a pre-trained LVSM Decoder-Only model using the `torch` library. Make sure to install the necessary dependencies and download a checkpoint.
|
26 |
+
|
27 |
+
```python
|
28 |
+
import torch
|
29 |
+
from lvsm.models.lvsm import LVSM # Assuming this is the correct import path
|
30 |
+
|
31 |
+
# Load the model
|
32 |
+
checkpoint_path = "path/to/your/checkpoint.pt" # Replace with your checkpoint path
|
33 |
+
model = LVSM.load_from_checkpoint(checkpoint_path)
|
34 |
+
|
35 |
+
# Move model to GPU if available
|
36 |
+
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
37 |
+
model.to(device)
|
38 |
+
|
39 |
+
# Example input (replace with your actual image data)
|
40 |
+
# ...
|
41 |
+
|
42 |
+
# Perform inference
|
43 |
+
# output = model(input)
|
44 |
+
# ...
|
45 |
+
```
|
46 |
+
|
47 |
+
## Citation
|
48 |
+
|
49 |
+
```bibtex
|
50 |
+
@inproceedings{
|
51 |
+
jin2025lvsm,
|
52 |
+
title={LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias},
|
53 |
+
author={Haian Jin and Hanwen Jiang and Hao Tan and Kai Zhang and Sai Bi and Tianyuan Zhang and Fujun Luan and Noah Snavely and Zexiang Xu},
|
54 |
+
booktitle={The Thirteenth International Conference on Learning Representations},
|
55 |
+
year={2025},
|
56 |
+
url={https://openreview.net/forum?id=QQBPWtvtcn}
|
57 |
+
}
|
58 |
+
```
|
59 |
+
|
60 |
+
## Acknowledgement
|
61 |
+
We thank Kalyan Sunkavalli for helpful discussions and support. This work was done when Haian Jin, Hanwen Jiang, and Tianyuan Zhang were research interns at Adobe Research. This work was also partly funded by the National Science Foundation (IIS-2211259, IIS-2212084).
|