picpilot-server / README.md
Vikramjeet Singh
Update README.md
6aff78f unverified
metadata
license: mit
sdk: docker
emoji: πŸš€
colorFrom: blue
colorTo: green
pinned: false
short_description: PicPilot Production Server

πŸš€ PicPilot

License SDK Color

PicPilot: Generate Stunning Photography and Craft Visual Narratives in seconds for your Brand

πŸ“– Overview

PicPilot is a scalable solution that leverages state-of-the-art Text to Image Models to extend and enhance images and create product photography This project has evolved through multiple iterations, addressing challenges and improving output quality at each stage.

Key Features:

  • segmentation using Segment Anything VIT Huge and YOLOv8s
  • High-quality outpainting with Controlnet + ZoeDepth
  • stable video diffusion support
  • Batch API support and EventDriven Queue Support
  • Logging and Telemetry using LogFire

πŸ— Architecture

image

Current Pipeline

  1. Object Detection: YOLOv8l provides accurate bounding boxes
  2. Segmentation: Segment Anything VIT Huge creates precise masks with ROI extension
  3. painting: Controlnet Zoe Depth + Realistic Vision XL
  4. I2V GenXL: Image to Video Generation Pipeline

🧠 Models used

πŸ“Έ Results

Here are some impressive results from our pipeline:

Cooker Output Toaster Output Chair Output
Tent Output Cycle Output

πŸ“Š Experimentation & Improvements

For detailed insights into our experimentation process, check out our Weights & Biases Report.

Recent improvements:

  • βœ… Deployed model as an API for batch processing
  • βœ… Implemented UI using Gradio
  • βœ… Integrated image-to-video model pipeline using I2V GenxL

πŸŽ₯ Sample Video

Check out our short demo video to see PicPilot in action:

https://github.com/VikramxD/product_diffusion_api/assets/72499426/c935ec2d-cb76-49dd-adae-8aa4feac211e


πŸ“„ License: MIT