itriedcoding KangLiao commited on
Commit
ba4b571
Β·
0 Parent(s):

Duplicate from KangLiao/Puffin

Browse files

Co-authored-by: Kang Liao <KangLiao@users.noreply.huggingface.co>

Files changed (7) hide show
  1. .gitattributes +195 -0
  2. LICENSE +35 -0
  3. Puffin-Align.pth +3 -0
  4. Puffin-Base.pth +3 -0
  5. Puffin-Instruct.pth +3 -0
  6. Puffin-Thinking.pth +3 -0
  7. README.md +104 -0
.gitattributes ADDED
@@ -0,0 +1,195 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ sample_132849/00_input_gt.png filter=lfs diff=lfs merge=lfs -text
37
+ sample_132849/01_target_gt.png filter=lfs diff=lfs merge=lfs -text
38
+ sample_132849/02_target_gt.png filter=lfs diff=lfs merge=lfs -text
39
+ sample_132849/03_novel_view_gen.png filter=lfs diff=lfs merge=lfs -text
40
+ sample_132849/03_target_gt.png filter=lfs diff=lfs merge=lfs -text
41
+ sample_132849/04_novel_view_gen.png filter=lfs diff=lfs merge=lfs -text
42
+ sample_132849/04_target_gt.png filter=lfs diff=lfs merge=lfs -text
43
+ sample_132849/05_novel_view_gen.png filter=lfs diff=lfs merge=lfs -text
44
+ sample_132849/05_target_gt.png filter=lfs diff=lfs merge=lfs -text
45
+ sample_132849/06_novel_view_gen.png filter=lfs diff=lfs merge=lfs -text
46
+ sample_132849/06_target_gt.png filter=lfs diff=lfs merge=lfs -text
47
+ sample_132849/07_novel_view_gen.png filter=lfs diff=lfs merge=lfs -text
48
+ sample_132849/07_target_gt.png filter=lfs diff=lfs merge=lfs -text
49
+ sample_132849/camera_trajectory.png filter=lfs diff=lfs merge=lfs -text
50
+ sample_15922/00_input_gt.png filter=lfs diff=lfs merge=lfs -text
51
+ sample_15922/01_target_gt.png filter=lfs diff=lfs merge=lfs -text
52
+ sample_15922/02_target_gt.png filter=lfs diff=lfs merge=lfs -text
53
+ sample_15922/03_novel_view_gen.png filter=lfs diff=lfs merge=lfs -text
54
+ sample_15922/03_target_gt.png filter=lfs diff=lfs merge=lfs -text
55
+ sample_15922/04_novel_view_gen.png filter=lfs diff=lfs merge=lfs -text
56
+ sample_15922/04_target_gt.png filter=lfs diff=lfs merge=lfs -text
57
+ sample_15922/05_novel_view_gen.png filter=lfs diff=lfs merge=lfs -text
58
+ sample_15922/05_target_gt.png filter=lfs diff=lfs merge=lfs -text
59
+ sample_15922/06_novel_view_gen.png filter=lfs diff=lfs merge=lfs -text
60
+ sample_15922/06_target_gt.png filter=lfs diff=lfs merge=lfs -text
61
+ sample_15922/07_novel_view_gen.png filter=lfs diff=lfs merge=lfs -text
62
+ sample_15922/07_target_gt.png filter=lfs diff=lfs merge=lfs -text
63
+ sample_15922/camera_trajectory.png filter=lfs diff=lfs merge=lfs -text
64
+ sample_209274/00_input_gt.png filter=lfs diff=lfs merge=lfs -text
65
+ sample_209274/01_target_gt.png filter=lfs diff=lfs merge=lfs -text
66
+ sample_209274/02_target_gt.png filter=lfs diff=lfs merge=lfs -text
67
+ sample_209274/03_novel_view_gen.png filter=lfs diff=lfs merge=lfs -text
68
+ sample_209274/03_target_gt.png filter=lfs diff=lfs merge=lfs -text
69
+ sample_209274/04_novel_view_gen.png filter=lfs diff=lfs merge=lfs -text
70
+ sample_209274/04_target_gt.png filter=lfs diff=lfs merge=lfs -text
71
+ sample_209274/05_novel_view_gen.png filter=lfs diff=lfs merge=lfs -text
72
+ sample_209274/05_target_gt.png filter=lfs diff=lfs merge=lfs -text
73
+ sample_209274/06_novel_view_gen.png filter=lfs diff=lfs merge=lfs -text
74
+ sample_209274/06_target_gt.png filter=lfs diff=lfs merge=lfs -text
75
+ sample_209274/07_novel_view_gen.png filter=lfs diff=lfs merge=lfs -text
76
+ sample_209274/07_target_gt.png filter=lfs diff=lfs merge=lfs -text
77
+ sample_209274/camera_trajectory.png filter=lfs diff=lfs merge=lfs -text
78
+ sample_280530/00_input_gt.png filter=lfs diff=lfs merge=lfs -text
79
+ sample_280530/01_target_gt.png filter=lfs diff=lfs merge=lfs -text
80
+ sample_280530/02_target_gt.png filter=lfs diff=lfs merge=lfs -text
81
+ sample_280530/03_novel_view_gen.png filter=lfs diff=lfs merge=lfs -text
82
+ sample_280530/03_target_gt.png filter=lfs diff=lfs merge=lfs -text
83
+ sample_280530/04_novel_view_gen.png filter=lfs diff=lfs merge=lfs -text
84
+ sample_280530/04_target_gt.png filter=lfs diff=lfs merge=lfs -text
85
+ sample_280530/05_novel_view_gen.png filter=lfs diff=lfs merge=lfs -text
86
+ sample_280530/05_target_gt.png filter=lfs diff=lfs merge=lfs -text
87
+ sample_280530/06_novel_view_gen.png filter=lfs diff=lfs merge=lfs -text
88
+ sample_280530/06_target_gt.png filter=lfs diff=lfs merge=lfs -text
89
+ sample_280530/07_novel_view_gen.png filter=lfs diff=lfs merge=lfs -text
90
+ sample_280530/07_target_gt.png filter=lfs diff=lfs merge=lfs -text
91
+ sample_280530/camera_trajectory.png filter=lfs diff=lfs merge=lfs -text
92
+ sample_507991/00_input_gt.png filter=lfs diff=lfs merge=lfs -text
93
+ sample_507991/01_target_gt.png filter=lfs diff=lfs merge=lfs -text
94
+ sample_507991/02_target_gt.png filter=lfs diff=lfs merge=lfs -text
95
+ sample_507991/03_novel_view_gen.png filter=lfs diff=lfs merge=lfs -text
96
+ sample_507991/03_target_gt.png filter=lfs diff=lfs merge=lfs -text
97
+ sample_507991/04_novel_view_gen.png filter=lfs diff=lfs merge=lfs -text
98
+ sample_507991/04_target_gt.png filter=lfs diff=lfs merge=lfs -text
99
+ sample_507991/05_novel_view_gen.png filter=lfs diff=lfs merge=lfs -text
100
+ sample_507991/05_target_gt.png filter=lfs diff=lfs merge=lfs -text
101
+ sample_507991/06_novel_view_gen.png filter=lfs diff=lfs merge=lfs -text
102
+ sample_507991/06_target_gt.png filter=lfs diff=lfs merge=lfs -text
103
+ sample_507991/07_novel_view_gen.png filter=lfs diff=lfs merge=lfs -text
104
+ sample_507991/07_target_gt.png filter=lfs diff=lfs merge=lfs -text
105
+ sample_507991/camera_trajectory.png filter=lfs diff=lfs merge=lfs -text
106
+ gen_sd3p5L_view8_N3_scheduler_cam_proj_shift3_dl3dv_re10k/sample_132849/00_input_gt.png filter=lfs diff=lfs merge=lfs -text
107
+ gen_sd3p5L_view8_N3_scheduler_cam_proj_shift3_dl3dv_re10k/sample_132849/01_target_gt.png filter=lfs diff=lfs merge=lfs -text
108
+ gen_sd3p5L_view8_N3_scheduler_cam_proj_shift3_dl3dv_re10k/sample_132849/02_target_gt.png filter=lfs diff=lfs merge=lfs -text
109
+ gen_sd3p5L_view8_N3_scheduler_cam_proj_shift3_dl3dv_re10k/sample_132849/03_novel_view_gen.png filter=lfs diff=lfs merge=lfs -text
110
+ gen_sd3p5L_view8_N3_scheduler_cam_proj_shift3_dl3dv_re10k/sample_132849/03_target_gt.png filter=lfs diff=lfs merge=lfs -text
111
+ gen_sd3p5L_view8_N3_scheduler_cam_proj_shift3_dl3dv_re10k/sample_132849/04_novel_view_gen.png filter=lfs diff=lfs merge=lfs -text
112
+ gen_sd3p5L_view8_N3_scheduler_cam_proj_shift3_dl3dv_re10k/sample_132849/04_target_gt.png filter=lfs diff=lfs merge=lfs -text
113
+ gen_sd3p5L_view8_N3_scheduler_cam_proj_shift3_dl3dv_re10k/sample_132849/05_novel_view_gen.png filter=lfs diff=lfs merge=lfs -text
114
+ gen_sd3p5L_view8_N3_scheduler_cam_proj_shift3_dl3dv_re10k/sample_132849/05_target_gt.png filter=lfs diff=lfs merge=lfs -text
115
+ gen_sd3p5L_view8_N3_scheduler_cam_proj_shift3_dl3dv_re10k/sample_132849/06_novel_view_gen.png filter=lfs diff=lfs merge=lfs -text
116
+ gen_sd3p5L_view8_N3_scheduler_cam_proj_shift3_dl3dv_re10k/sample_132849/06_target_gt.png filter=lfs diff=lfs merge=lfs -text
117
+ gen_sd3p5L_view8_N3_scheduler_cam_proj_shift3_dl3dv_re10k/sample_132849/07_novel_view_gen.png filter=lfs diff=lfs merge=lfs -text
118
+ gen_sd3p5L_view8_N3_scheduler_cam_proj_shift3_dl3dv_re10k/sample_132849/07_target_gt.png filter=lfs diff=lfs merge=lfs -text
119
+ gen_sd3p5L_view8_N3_scheduler_cam_proj_shift3_dl3dv_re10k/sample_132849/camera_trajectory.png filter=lfs diff=lfs merge=lfs -text
120
+ gen_sd3p5L_view8_N3_scheduler_cam_proj_shift3_dl3dv_re10k/sample_15922/00_input_gt.png filter=lfs diff=lfs merge=lfs -text
121
+ gen_sd3p5L_view8_N3_scheduler_cam_proj_shift3_dl3dv_re10k/sample_15922/01_target_gt.png filter=lfs diff=lfs merge=lfs -text
122
+ gen_sd3p5L_view8_N3_scheduler_cam_proj_shift3_dl3dv_re10k/sample_15922/02_target_gt.png filter=lfs diff=lfs merge=lfs -text
123
+ gen_sd3p5L_view8_N3_scheduler_cam_proj_shift3_dl3dv_re10k/sample_15922/03_novel_view_gen.png filter=lfs diff=lfs merge=lfs -text
124
+ gen_sd3p5L_view8_N3_scheduler_cam_proj_shift3_dl3dv_re10k/sample_15922/03_target_gt.png filter=lfs diff=lfs merge=lfs -text
125
+ gen_sd3p5L_view8_N3_scheduler_cam_proj_shift3_dl3dv_re10k/sample_15922/04_novel_view_gen.png filter=lfs diff=lfs merge=lfs -text
126
+ gen_sd3p5L_view8_N3_scheduler_cam_proj_shift3_dl3dv_re10k/sample_15922/04_target_gt.png filter=lfs diff=lfs merge=lfs -text
127
+ gen_sd3p5L_view8_N3_scheduler_cam_proj_shift3_dl3dv_re10k/sample_15922/05_novel_view_gen.png filter=lfs diff=lfs merge=lfs -text
128
+ gen_sd3p5L_view8_N3_scheduler_cam_proj_shift3_dl3dv_re10k/sample_15922/05_target_gt.png filter=lfs diff=lfs merge=lfs -text
129
+ gen_sd3p5L_view8_N3_scheduler_cam_proj_shift3_dl3dv_re10k/sample_15922/06_novel_view_gen.png filter=lfs diff=lfs merge=lfs -text
130
+ gen_sd3p5L_view8_N3_scheduler_cam_proj_shift3_dl3dv_re10k/sample_15922/06_target_gt.png filter=lfs diff=lfs merge=lfs -text
131
+ gen_sd3p5L_view8_N3_scheduler_cam_proj_shift3_dl3dv_re10k/sample_15922/07_novel_view_gen.png filter=lfs diff=lfs merge=lfs -text
132
+ gen_sd3p5L_view8_N3_scheduler_cam_proj_shift3_dl3dv_re10k/sample_15922/07_target_gt.png filter=lfs diff=lfs merge=lfs -text
133
+ gen_sd3p5L_view8_N3_scheduler_cam_proj_shift3_dl3dv_re10k/sample_15922/camera_trajectory.png filter=lfs diff=lfs merge=lfs -text
134
+ gen_sd3p5L_view8_N3_scheduler_cam_proj_shift3_dl3dv_re10k/sample_209274/00_input_gt.png filter=lfs diff=lfs merge=lfs -text
135
+ gen_sd3p5L_view8_N3_scheduler_cam_proj_shift3_dl3dv_re10k/sample_209274/01_target_gt.png filter=lfs diff=lfs merge=lfs -text
136
+ gen_sd3p5L_view8_N3_scheduler_cam_proj_shift3_dl3dv_re10k/sample_209274/02_target_gt.png filter=lfs diff=lfs merge=lfs -text
137
+ gen_sd3p5L_view8_N3_scheduler_cam_proj_shift3_dl3dv_re10k/sample_209274/03_novel_view_gen.png filter=lfs diff=lfs merge=lfs -text
138
+ gen_sd3p5L_view8_N3_scheduler_cam_proj_shift3_dl3dv_re10k/sample_209274/03_target_gt.png filter=lfs diff=lfs merge=lfs -text
139
+ gen_sd3p5L_view8_N3_scheduler_cam_proj_shift3_dl3dv_re10k/sample_209274/04_novel_view_gen.png filter=lfs diff=lfs merge=lfs -text
140
+ gen_sd3p5L_view8_N3_scheduler_cam_proj_shift3_dl3dv_re10k/sample_209274/04_target_gt.png filter=lfs diff=lfs merge=lfs -text
141
+ gen_sd3p5L_view8_N3_scheduler_cam_proj_shift3_dl3dv_re10k/sample_209274/05_novel_view_gen.png filter=lfs diff=lfs merge=lfs -text
142
+ gen_sd3p5L_view8_N3_scheduler_cam_proj_shift3_dl3dv_re10k/sample_209274/05_target_gt.png filter=lfs diff=lfs merge=lfs -text
143
+ gen_sd3p5L_view8_N3_scheduler_cam_proj_shift3_dl3dv_re10k/sample_209274/06_novel_view_gen.png filter=lfs diff=lfs merge=lfs -text
144
+ gen_sd3p5L_view8_N3_scheduler_cam_proj_shift3_dl3dv_re10k/sample_209274/06_target_gt.png filter=lfs diff=lfs merge=lfs -text
145
+ gen_sd3p5L_view8_N3_scheduler_cam_proj_shift3_dl3dv_re10k/sample_209274/07_novel_view_gen.png filter=lfs diff=lfs merge=lfs -text
146
+ gen_sd3p5L_view8_N3_scheduler_cam_proj_shift3_dl3dv_re10k/sample_209274/07_target_gt.png filter=lfs diff=lfs merge=lfs -text
147
+ gen_sd3p5L_view8_N3_scheduler_cam_proj_shift3_dl3dv_re10k/sample_209274/camera_trajectory.png filter=lfs diff=lfs merge=lfs -text
148
+ gen_sd3p5L_view8_N3_scheduler_cam_proj_shift3_dl3dv_re10k/sample_280530/00_input_gt.png filter=lfs diff=lfs merge=lfs -text
149
+ gen_sd3p5L_view8_N3_scheduler_cam_proj_shift3_dl3dv_re10k/sample_280530/01_target_gt.png filter=lfs diff=lfs merge=lfs -text
150
+ gen_sd3p5L_view8_N3_scheduler_cam_proj_shift3_dl3dv_re10k/sample_280530/02_target_gt.png filter=lfs diff=lfs merge=lfs -text
151
+ gen_sd3p5L_view8_N3_scheduler_cam_proj_shift3_dl3dv_re10k/sample_280530/03_novel_view_gen.png filter=lfs diff=lfs merge=lfs -text
152
+ gen_sd3p5L_view8_N3_scheduler_cam_proj_shift3_dl3dv_re10k/sample_280530/03_target_gt.png filter=lfs diff=lfs merge=lfs -text
153
+ gen_sd3p5L_view8_N3_scheduler_cam_proj_shift3_dl3dv_re10k/sample_280530/04_novel_view_gen.png filter=lfs diff=lfs merge=lfs -text
154
+ gen_sd3p5L_view8_N3_scheduler_cam_proj_shift3_dl3dv_re10k/sample_280530/04_target_gt.png filter=lfs diff=lfs merge=lfs -text
155
+ gen_sd3p5L_view8_N3_scheduler_cam_proj_shift3_dl3dv_re10k/sample_280530/05_novel_view_gen.png filter=lfs diff=lfs merge=lfs -text
156
+ gen_sd3p5L_view8_N3_scheduler_cam_proj_shift3_dl3dv_re10k/sample_280530/05_target_gt.png filter=lfs diff=lfs merge=lfs -text
157
+ gen_sd3p5L_view8_N3_scheduler_cam_proj_shift3_dl3dv_re10k/sample_280530/06_novel_view_gen.png filter=lfs diff=lfs merge=lfs -text
158
+ gen_sd3p5L_view8_N3_scheduler_cam_proj_shift3_dl3dv_re10k/sample_280530/06_target_gt.png filter=lfs diff=lfs merge=lfs -text
159
+ gen_sd3p5L_view8_N3_scheduler_cam_proj_shift3_dl3dv_re10k/sample_280530/07_novel_view_gen.png filter=lfs diff=lfs merge=lfs -text
160
+ gen_sd3p5L_view8_N3_scheduler_cam_proj_shift3_dl3dv_re10k/sample_280530/07_target_gt.png filter=lfs diff=lfs merge=lfs -text
161
+ gen_sd3p5L_view8_N3_scheduler_cam_proj_shift3_dl3dv_re10k/sample_280530/camera_trajectory.png filter=lfs diff=lfs merge=lfs -text
162
+ gen_sd3p5L_view8_N3_scheduler_cam_proj_shift3_dl3dv_re10k/sample_507991/00_input_gt.png filter=lfs diff=lfs merge=lfs -text
163
+ gen_sd3p5L_view8_N3_scheduler_cam_proj_shift3_dl3dv_re10k/sample_507991/01_target_gt.png filter=lfs diff=lfs merge=lfs -text
164
+ gen_sd3p5L_view8_N3_scheduler_cam_proj_shift3_dl3dv_re10k/sample_507991/02_target_gt.png filter=lfs diff=lfs merge=lfs -text
165
+ gen_sd3p5L_view8_N3_scheduler_cam_proj_shift3_dl3dv_re10k/sample_507991/03_novel_view_gen.png filter=lfs diff=lfs merge=lfs -text
166
+ gen_sd3p5L_view8_N3_scheduler_cam_proj_shift3_dl3dv_re10k/sample_507991/03_target_gt.png filter=lfs diff=lfs merge=lfs -text
167
+ gen_sd3p5L_view8_N3_scheduler_cam_proj_shift3_dl3dv_re10k/sample_507991/04_novel_view_gen.png filter=lfs diff=lfs merge=lfs -text
168
+ gen_sd3p5L_view8_N3_scheduler_cam_proj_shift3_dl3dv_re10k/sample_507991/04_target_gt.png filter=lfs diff=lfs merge=lfs -text
169
+ gen_sd3p5L_view8_N3_scheduler_cam_proj_shift3_dl3dv_re10k/sample_507991/05_novel_view_gen.png filter=lfs diff=lfs merge=lfs -text
170
+ gen_sd3p5L_view8_N3_scheduler_cam_proj_shift3_dl3dv_re10k/sample_507991/05_target_gt.png filter=lfs diff=lfs merge=lfs -text
171
+ gen_sd3p5L_view8_N3_scheduler_cam_proj_shift3_dl3dv_re10k/sample_507991/06_novel_view_gen.png filter=lfs diff=lfs merge=lfs -text
172
+ gen_sd3p5L_view8_N3_scheduler_cam_proj_shift3_dl3dv_re10k/sample_507991/06_target_gt.png filter=lfs diff=lfs merge=lfs -text
173
+ gen_sd3p5L_view8_N3_scheduler_cam_proj_shift3_dl3dv_re10k/sample_507991/07_novel_view_gen.png filter=lfs diff=lfs merge=lfs -text
174
+ gen_sd3p5L_view8_N3_scheduler_cam_proj_shift3_dl3dv_re10k/sample_507991/07_target_gt.png filter=lfs diff=lfs merge=lfs -text
175
+ gen_sd3p5L_view8_N3_scheduler_cam_proj_shift3_dl3dv_re10k/sample_507991/camera_trajectory.png filter=lfs diff=lfs merge=lfs -text
176
+ bryan-goff-IuyhXAia8EA-unsplash_0.png filter=lfs diff=lfs merge=lfs -text
177
+ bryan-goff-IuyhXAia8EA-unsplash_1.png filter=lfs diff=lfs merge=lfs -text
178
+ bryan-goff-IuyhXAia8EA-unsplash_10.png filter=lfs diff=lfs merge=lfs -text
179
+ bryan-goff-IuyhXAia8EA-unsplash_11.png filter=lfs diff=lfs merge=lfs -text
180
+ bryan-goff-IuyhXAia8EA-unsplash_12.png filter=lfs diff=lfs merge=lfs -text
181
+ bryan-goff-IuyhXAia8EA-unsplash_13.png filter=lfs diff=lfs merge=lfs -text
182
+ bryan-goff-IuyhXAia8EA-unsplash_14.png filter=lfs diff=lfs merge=lfs -text
183
+ bryan-goff-IuyhXAia8EA-unsplash_15.png filter=lfs diff=lfs merge=lfs -text
184
+ bryan-goff-IuyhXAia8EA-unsplash_16.png filter=lfs diff=lfs merge=lfs -text
185
+ bryan-goff-IuyhXAia8EA-unsplash_17.png filter=lfs diff=lfs merge=lfs -text
186
+ bryan-goff-IuyhXAia8EA-unsplash_18.png filter=lfs diff=lfs merge=lfs -text
187
+ bryan-goff-IuyhXAia8EA-unsplash_19.png filter=lfs diff=lfs merge=lfs -text
188
+ bryan-goff-IuyhXAia8EA-unsplash_2.png filter=lfs diff=lfs merge=lfs -text
189
+ bryan-goff-IuyhXAia8EA-unsplash_20.png filter=lfs diff=lfs merge=lfs -text
190
+ bryan-goff-IuyhXAia8EA-unsplash_21.png filter=lfs diff=lfs merge=lfs -text
191
+ bryan-goff-IuyhXAia8EA-unsplash_22.png filter=lfs diff=lfs merge=lfs -text
192
+ bryan-goff-IuyhXAia8EA-unsplash_23.png filter=lfs diff=lfs merge=lfs -text
193
+ bryan-goff-IuyhXAia8EA-unsplash_24.png filter=lfs diff=lfs merge=lfs -text
194
+ bryan-goff-IuyhXAia8EA-unsplash_25.png filter=lfs diff=lfs merge=lfs -text
195
+ bryan-goff-IuyhXAia8EA-unsplash_26.png filter=lfs diff=lfs merge=lfs -text
LICENSE ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ S-Lab License 1.0
2
+
3
+ Copyright 2025 S-Lab
4
+
5
+ Redistribution and use for non-commercial purpose in source and
6
+ binary forms, with or without modification, are permitted provided
7
+ that the following conditions are met:
8
+
9
+ 1. Redistributions of source code must retain the above copyright
10
+ notice, this list of conditions and the following disclaimer.
11
+
12
+ 2. Redistributions in binary form must reproduce the above copyright
13
+ notice, this list of conditions and the following disclaimer in
14
+ the documentation and/or other materials provided with the
15
+ distribution.
16
+
17
+ 3. Neither the name of the copyright holder nor the names of its
18
+ contributors may be used to endorse or promote products derived
19
+ from this software without specific prior written permission.
20
+
21
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
22
+ "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
23
+ LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
24
+ A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
25
+ HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
26
+ SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
27
+ LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
28
+ DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
29
+ THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
30
+ (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
31
+ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
32
+
33
+ In the event that redistribution and/or use for commercial purpose in
34
+ source or binary forms, with or without modification is required,
35
+ please contact the contributor(s) of the work.
Puffin-Align.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f4a7c72ba84d243fb8f7bba12625edef11dc60807e2318afb52f6bf1a4868e4d
3
+ size 455123281
Puffin-Base.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4045661c81b29adc8aa1cc22079ddbbf86d353d2e0f35c0ffec310f193504257
3
+ size 8902683781
Puffin-Instruct.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6603d9cefcb28505981cc645d61b78def760b1956901f02b6db389df2218ff27
3
+ size 7599344245
Puffin-Thinking.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e157dc788f7bf512d0439166ad00f748ae62df580c87a02d43e67c326e1fd7d3
3
+ size 8902683829
README.md ADDED
@@ -0,0 +1,104 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - unified multimodal model
4
+ - camera-centric
5
+ - generation
6
+ - understanding
7
+ - spatial intelligence
8
+ - 3D vision
9
+ pipeline_tag: text-to-3d
10
+ license: other
11
+ ---
12
+
13
+ # **Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation**
14
+
15
+ <p align="center">
16
+ <img src="https://github.com/KangLiao929/Puffin/blob/main/assets/website/tesear_horizon.png?raw=true" alt="Thinking with Camera" width="100%">
17
+ </p>
18
+
19
+ ## Paper
20
+ This model was presented in the paper [Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation](https://huggingface.co/papers/2510.08673).
21
+
22
+ ## Abstract
23
+ Camera-centric understanding and generation are two cornerstones of spatial intelligence, yet they are typically studied in isolation. We present Puffin, a unified camera-centric multimodal model that extends spatial awareness along the camera dimension. Puffin integrates language regression and diffusion-based generation to interpret and create scenes from arbitrary viewpoints. To bridge the modality gap between cameras and vision-language, we introduce a novel paradigm that treats camera as language, enabling thinking with camera. This guides the model to align spatially grounded visual cues with photographic terminology while reasoning across geometric context. Puffin is trained on Puffin-4M, a large-scale dataset of 4 million vision-language-camera triplets. We incorporate both global camera parameters and pixel-wise camera maps, yielding flexible and reliable spatial generation. Experiments demonstrate Puffin superior performance over specialized models for camera-centric generation and understanding. With instruction tuning, Puffin generalizes to diverse cross-view tasks such as spatial imagination, world exploration, and photography guidance.
24
+
25
+ ## Links
26
+ * **Project Page**: [https://kangliao929.github.io/projects/puffin](https://kangliao929.github.io/projects/puffin)
27
+ * **GitHub Repository**: [https://github.com/KangLiao929/Puffin](https://github.com/KangLiao929/Puffin)
28
+ * **Hugging Face Space**: [https://huggingface.co/spaces/KangLiao/Puffin](https://huggingface.co/spaces/KangLiao/Puffin)
29
+ * **Hugging Face Dataset**: [https://huggingface.co/datasets/KangLiao/Puffin-4M](https://huggingface.co/datasets/KangLiao/Puffin-4M)
30
+
31
+ ## Model Details
32
+
33
+ Puffin is a unified camera-centric multimodal model that extends spatial awareness along the camera dimension. It learns the **camera-centric** understanding and generation tasks in **a unified multimodal framework**. To bridge the modality gap between cameras and vision-language, we introduce a novel paradigm that treats camera as language, enabling **thinking with camera**. This guides the model to align spatially grounded visual cues with photographic terminology while reasoning across geometric context.
34
+
35
+ | | |
36
+ |---|---|
37
+ | **Developed by** | Kang Liao, Size Wu, Zhonghua Wu, Linyi Jin, Chao Wang, Yikai Wang, Fei Wang, Wei Li, Chen Change Loy |
38
+ | **Affiliation** | S-Lab, Nanyang Technological University |
39
+ | **First released** | arXiv pre-print, 2025 |
40
+ | **Model type** | Unified multimodal models (diffusion / autoregressive modelling with camera-centric understanding and generation) |
41
+ | **Modality** | Image β†’ Text+Camera; Text+Camera β†’ Image; Image+Camera β†’ Image; Image+Camera β†’ Text |
42
+
43
+ ---
44
+
45
+ ### Direct Use
46
+ - **Camera-centric understanding and generation** from a single image or a pair of text and camera, supports the thinking mode.
47
+ - **World exploration**: performs the cross-view generation from a given initial view and target camera configuration.
48
+ - **Spatial imagination**: imagines the scene description based on an initial view and target camera configuration.
49
+ - **3D virtual object insertion** in AR/VR: assists the virtual 3D object insertion into in-the-wild images by calibrating camera parameters
50
+
51
+ ## Sample Usage
52
+
53
+ This section demonstrates how to generate images with camera control using Puffin-Base, based on the examples provided in the [GitHub repository](https://github.com/KangLiao929/Puffin).
54
+
55
+ First, download the model checkpoints from πŸ€— [KangLiao/Puffin](https://huggingface.co/KangLiao/Puffin) and organize them in a `checkpoints` directory, for example:
56
+ ```text
57
+ Puffin/
58
+ β”œβ”€β”€ checkpoints
59
+ β”œβ”€β”€ Puffin-Align.pth # provided for customized SFT
60
+ β”œβ”€β”€ Puffin-Base.pth
61
+ β”œβ”€β”€ Puffin-Thinking.pth
62
+ β”œβ”€β”€ Puffin-Instruct.pth
63
+ ```
64
+ You can use `huggingface-cli` to download the checkpoints:
65
+ ```bash
66
+ # pip install -U "huggingface_hub[cli]"
67
+ huggingface-cli download KangLiao/Puffin --local-dir checkpoints --repo-type model
68
+ ```
69
+
70
+ To run the camera-controllable image generation:
71
+
72
+ ```shell
73
+ export PYTHONPATH=./:$PYTHONPATH
74
+ python scripts/demo/generation.py configs/pipelines/stage_2_base.py \
75
+ --checkpoint checkpoints/Puffin-Base.pth --output generation_result.jpg \
76
+ --prompt "A streetlamp casts light on an outdoor mural with intricate floral designs and text, set against a building wall." \
77
+ -r -0.3939 -p 0.0277 -f 0.7595
78
+ ```
79
+ This command generates an image based on the provided text prompt and camera parameters (roll: `-r`, pitch: `-p`, vertical field-of-view: `-f`, all in radians). The output image will be saved as `generation_result.jpg`.
80
+
81
+ To enable the thinking mode for image generation, please simply change the settings and append the `--thinking` flag:
82
+
83
+ ```shell
84
+ python scripts/demo/generation.py configs/pipelines/stage_3_thinking.py \
85
+ --checkpoint checkpoints/Puffin-Thinking.pth --output generation_result_thinking.jpg \
86
+ --prompt "A streetlamp casts light on an outdoor mural with intricate floral designs and text, set against a building wall." \
87
+ -r -0.3939 -p 0.0277 -f 0.7595 \
88
+ --thinking
89
+ ```
90
+
91
+ ### Citation
92
+ If you find Puffin useful for your research or applications, please cite our paper using the following BibTeX:
93
+
94
+ ```bibtex
95
+ @article{liao2025puffin,
96
+ title={Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation},
97
+ author={Liao, Kang and Wu, Size and Wu, Zhonghua and Jin, Linyi and Wang, Chao and Wang, Yikai and Wang, Fei and Li, Wei and Loy, Chen Change},
98
+ journal={arXiv preprint arXiv:2510.08673},
99
+ year={2025}
100
+ }
101
+ ```
102
+
103
+ ### License
104
+ This project is licensed under [NTU S-Lab License 1.0](LICENSE).