Spaces:
Running
on
Zero
Running
on
Zero
Update app.py
Browse files
app.py
CHANGED
@@ -414,7 +414,7 @@ with gr.Blocks(title="OneDiffusion Demo") as demo:
|
|
414 |
|
415 |
2. **Upload Images**: Drag and drop images directly onto the upload area, or click to select files from your device.
|
416 |
|
417 |
-
3. **Generate Captions**: **If you upload any images**, Click the "Generate Captions" button to format the text prompt according to chosen task. In this demo, you will NEED to provide the caption of each source image manually.
|
418 |
|
419 |
4. **Configure Generation Settings**: Expand the "Advanced Configuration" section to adjust parameters like the number of inference steps, guidance scale, image size, and more.
|
420 |
|
@@ -430,7 +430,7 @@ with gr.Blocks(title="OneDiffusion Demo") as demo:
|
|
430 |
|
431 |
- For boundingbox2image/semantic2image/inpainting etc tasks:
|
432 |
+ To perform condition-to-image such as semantic map to image, follow above steps
|
433 |
-
+ For image-to-condition e.g., image to depth, change the denoise_mask checkbox before generating images. You must UNCHECK image_0 box and CHECK image_1 box.
|
434 |
|
435 |
- For FaceID tasks:
|
436 |
+ Use 3 or 4 images if single input image does not give satisfactory results.
|
@@ -440,7 +440,7 @@ with gr.Blocks(title="OneDiffusion Demo") as demo:
|
|
440 |
+ If you have non-human subjects and does not get satisfactory results, try "copying" part of caption of source images where it describes the properties of the subject e.g., a monster with red eyes, sharp teeth, etc.
|
441 |
|
442 |
- For Multiview generation:
|
443 |
-
+ The input camera elevation/azimuth ALWAYS starts with
|
444 |
+ Only support square images (ideally in 512x512 resolution).
|
445 |
+ Ensure the number of elevations, azimuths, and distances are equal.
|
446 |
+ The model generally works well for 2-5 views (include both input and generated images). Since the model is trained with 3 views on 512x512 resolution, you might try scale_factor of [1.1; 1.5] and scale_watershed of [100; 400] for better extrapolation.
|
|
|
414 |
|
415 |
2. **Upload Images**: Drag and drop images directly onto the upload area, or click to select files from your device.
|
416 |
|
417 |
+
3. **Generate Captions**: **If you upload any images**, Click the "Generate Captions" button to format the text prompt according to chosen task. In this demo, you will **NEED** to provide the caption of each source image manually. We recommend using Molmo for captioning.
|
418 |
|
419 |
4. **Configure Generation Settings**: Expand the "Advanced Configuration" section to adjust parameters like the number of inference steps, guidance scale, image size, and more.
|
420 |
|
|
|
430 |
|
431 |
- For boundingbox2image/semantic2image/inpainting etc tasks:
|
432 |
+ To perform condition-to-image such as semantic map to image, follow above steps
|
433 |
+
+ For image-to-condition e.g., image to depth, change the denoise_mask checkbox before generating images. You must UNCHECK image_0 box and CHECK image_1 box. Caption is not required for this task.
|
434 |
|
435 |
- For FaceID tasks:
|
436 |
+ Use 3 or 4 images if single input image does not give satisfactory results.
|
|
|
440 |
+ If you have non-human subjects and does not get satisfactory results, try "copying" part of caption of source images where it describes the properties of the subject e.g., a monster with red eyes, sharp teeth, etc.
|
441 |
|
442 |
- For Multiview generation:
|
443 |
+
+ The input camera elevation/azimuth ALWAYS starts with 0. If you want to generate images of azimuths 30,60,90 and elevations of 10,20,30 (wrt input image), the correct input azimuth is: `0, 30, 60, 90`; input elevation is `0,10,20,30`. The camera distance will be `1.5,1.5,1.5,1.5`
|
444 |
+ Only support square images (ideally in 512x512 resolution).
|
445 |
+ Ensure the number of elevations, azimuths, and distances are equal.
|
446 |
+ The model generally works well for 2-5 views (include both input and generated images). Since the model is trained with 3 views on 512x512 resolution, you might try scale_factor of [1.1; 1.5] and scale_watershed of [100; 400] for better extrapolation.
|