picpilot-server / README.md
Vikramjeet Singh
Update README.md
2b60b6a
|
raw
history blame
No virus
3.2 kB

Outpainting Pipeline

  • Intial Version of the project used a combination of Yolov8s Segmentation model to provide a rough mask , which was then inverted for Outpainting to use with Models like stable diffusion inpainting model from Runway along with ControlNet to control the outpainted generated output ,
  • There were blockers in that approach as first the mask , was of a poor quality with rough edges which were messing with the outpainting output , even with techniques like blurring the mask the output quality was poor initailly with stable diffusion models
  • To address this , changes like detecting the ROI of the object in focus in addition to extending and resizing the image was done , the model for segmentation was upgraded to Segment Anything VIT Huge with yolov8l model , providing the bounding boxes for the Box prompt which was then inverted for outpainting
  • The model was changed kandinsky-v2.2-decoder-inpaint with 800 inference steps , a guidence scale of 5.0 to 7.5 and then the following results were achieved
  • GPU used Nvidia A100 40GB

ARCHITECTURE

Architecture drawio

Installation

To install the necessary requirements, you can use pip:

pip install -r requirements.txt
wandb login
huggingface-cli login
cd scripts

This will install all necessary libraries for this project, including PIL , Diffusers , Segment Anything, wandb ,

python run.py --image_path /path/to/image.jpg --prompt 'prompt' --negative_prompt 'negative prompt' --output_dir /path/to/output --mask_dir /path/to/mask --uid unique_id

MODELS USED

EXPERIMENTATION WITH THE FOLLOWING models

WEIGHTS AND BIASES EXPERIMENTATION REPORT

WANDB REPORT

cooker_output toaster_output chair tent output cycle

Some Improvements

  • Working on API to deploy this model in batch mode adding loggers from prompt and generated output
  • Implementation of UI in Gradio / Streamlit for checking the model out in visual way
  • Experimenting with image to video model pipeline to generate a video output thinking of using (https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt) model for this

Short Video

https://github.com/VikramxD/product_diffusion_api/assets/72499426/c935ec2d-cb76-49dd-adae-8aa4feac211e