File size: 3,205 Bytes
df13f4b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
## Important configuration options for [inference.py](../inference.py):

### 1. General configs
| Configuration | default |   Explanation  | 
|:------------- |:----- | :------------- |
| `--image_dir` | './test/images/fruit.png' | Image file path |
| `--out_dir` | './output' | Output directory |
| `--device` | 'cuda:0' | The device to use |
| `--exp_name` | None | Experiment name, use image file name by default |
### 2. Point cloud render configs
#### The definition of world coordinate system and tips for adjusting point cloud render configs are illustrated in [render document](./render_help.md).
| Configuration | default |   Explanation  | 
|:------------- |:----- | :------------- |
| `--mode` | 'single_view_txt' | Currently we support 'single_view_txt' and 'single_view_target' |
| `--traj_txt` | None | Required for 'single_view_txt' mode, a txt file that specify camera trajectory |
| `--elevation` | 5. | The elevation angle of the input image in degree. Estimate a rough value based on your visual judgment |
| `--center_scale` | 1. | Scale factor for the spherical radius (r). By default, r is set to the depth value of the center pixel (H//2, W//2) of the reference image |
| `--d_theta` | 10. | Required for 'single_view_target' mode, specify target theta angle as (theta + d_theta) |
| `--d_phi` | 30. | Required for 'single_view_target' mode, specify target phi angle as (phi + d_phi) |
| `--d_r` | -.2 | Required for 'single_view_target' mode, specify target radius as (r + r*dr) |
### 3. Diffusion configs
| Configuration | default |   Explanation  | 
|:------------- |:----- | :------------- |
| `--ckpt_path` | './checkpoints/ViewCrafter_25.ckpt' | Checkpoint path |
| `--config` | './configs/inference_pvd_1024.yaml' | Config (yaml) path |
| `--ddim_steps` | 50 | Steps of ddim if positive, otherwise use DDPM, reduce to 10 to speed up inference |
| `--ddim_eta` | 1.0 | Eta for ddim sampling (0.0 yields deterministic sampling) |
| `--bs` | 1 | Batch size for inference, should be one |
| `--height` | 576 | Image height, in pixel space |
| `--width` | 1024 | Image width, in pixel space |
| `--frame_stride` | 10 | Fixed |
| `--unconditional_guidance_scale` | 7.5 | Prompt classifier-free guidance |
| `--seed` | 123 | Seed for seed_everything |
| `--video_length` | 25 | Inference video length, change to 16 if you use 16 frame model |
| `--negative_prompt` | False | Unused |
| `--text_input` | False | Unused |
| `--prompt` | 'Rotating view of a scene' | Fixed |
| `--multiple_cond_cfg` | False | Use multi-condition cfg or not |
| `--cfg_img` | None | Guidance scale for image conditioning |
| `--timestep_spacing` | "uniform_trailing" | The way the timesteps should be scaled. Refer to Table 2 of the [Common Diffusion Noise Schedules and Sample Steps are Flawed](https://huggingface.co/papers/2305.08891) for more information. |
| `--guidance_rescale` | 0.7 | Guidance rescale in [Common Diffusion Noise Schedules and Sample Steps are Flawed](https://huggingface.co/papers/2305.08891) |
| `--perframe_ae` | True | If we use per-frame AE decoding, set it to True to save GPU memory, especially for the model of 576x1024 |
| `--n_samples` | 1 | Num of samples per prompt |