Spaces:
Running
Running
Merge branch 'main' of https://github.com/VikramxD/product_diffusion_api
Browse files
README.md
ADDED
@@ -0,0 +1,46 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Outpainting Pipeline
|
2 |
+
|
3 |
+
- Intial Version of the project used a combination of Yolov8s Segmentation model to provide a rough mask , which was then inverted for Outpainting to use with Models like stable diffusion inpainting model from Runway along with ControlNet to control the outpainted generated output ,
|
4 |
+
- There were blockers in that approach as first the mask , was of a poor quality with rough edges which were messing with the outpainting output , even with techniques like blurring the mask the output quality was poor initailly with stable diffusion models
|
5 |
+
- To address this , changes like detecting the ROI of the object in focus in addition to extending and resizing the image was done , the model for segmentation was upgraded to Segment Anything VIT Huge with yolov8l model , providing the bounding boxes for the Box prompt which was then inverted for outpainting
|
6 |
+
- The model was changed kandinsky-v2.2-decoder-inpaint with 800 inference steps , a guidence scale of 5.0 to 7.5 and then the following results were achieved
|
7 |
+
- GPU used Nvidia A100 40GB
|
8 |
+
|
9 |
+
|
10 |
+
|
11 |
+
## Installation
|
12 |
+
|
13 |
+
To install the necessary requirements, you can use pip:
|
14 |
+
|
15 |
+
```bash
|
16 |
+
pip install -r requirements.txt
|
17 |
+
wandb login
|
18 |
+
huggingface-cli login
|
19 |
+
cd scripts
|
20 |
+
```
|
21 |
+
|
22 |
+
This will install all necessary libraries for this project, including PIL , Diffusers , Segment Anything, wandb ,
|
23 |
+
|
24 |
+
```bash
|
25 |
+
python run.py --image_path /path/to/image.jpg --prompt 'prompt' --negative_prompt 'negative prompt' --output_dir /path/to/output --mask_dir /path/to/mask --uid unique_id
|
26 |
+
|
27 |
+
```
|
28 |
+
### Some Experiments
|
29 |
+
Here are some of my experiments with the following models
|
30 |
+
- https://huggingface.co/runwayml/stable-diffusion-inpainting
|
31 |
+
- https://huggingface.co/lllyasviel/sd-controlnet-seg
|
32 |
+
- https://huggingface.co/kandinsky-community/kandinsky-2-2-decoder-inpaint
|
33 |
+
|
34 |
+
|
35 |
+
|
36 |
+
|
37 |
+
![cooker_output](https://github.com/VikramxD/product_diffusion_api/assets/72499426/1228718b-5ef7-44a1-81f6-2953ffdc767c)
|
38 |
+
![toaster_output](https://github.com/VikramxD/product_diffusion_api/assets/72499426/06e12aea-cdc2-4ab8-97e0-be77bc49a238)
|
39 |
+
![chair](https://github.com/VikramxD/product_diffusion_api/assets/72499426/65bcd04f-a715-43c3-8928-a9669f8eda85)
|
40 |
+
![Generated Image Pipeline Call 1](https://github.com/VikramxD/product_diffusion_api/assets/72499426/dd6af644-1c07-424a-8ba6-0715a5611094)
|
41 |
+
![Generated Image Pipeline Call (1)](https://github.com/VikramxD/product_diffusion_api/assets/72499426/b1b8c745-deb4-41ff-a93a-77fa06f55cc3)
|
42 |
+
|
43 |
+
## Some Improvements
|
44 |
+
- Working on API to deploy this model in batch mode adding loggers from prompt and generated output
|
45 |
+
- Implementation of UI in Gradio / Streamlit for checking the model out in visual way
|
46 |
+
- Experimenting with image to video model pipeline to generate a video output thinking of using (https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt) model for this
|