picpilot-server / README.md
Vikramjeet Singh
Update README.md
80ed5ed
|
raw
history blame
3.54 kB
metadata
license: mit
sdk: docker
emoji: πŸš€
colorFrom: blue
colorTo: green
pinned: false
short_description: PicPilot Production Server

πŸš€ PicPilot

License SDK Color

PicPilot: Generate Stunning Photography and Craft Visual Narratives in seconds for your Brand

πŸ“– Overview

PicPilot is a scalable solution that leverages state-of-the-art Text to Image Models to extend and enhance images. This project has evolved through multiple iterations, addressing challenges and improving output quality at each stage.

Key Features:

  • segmentation using Segment Anything VIT Huge and YOLOv8s
  • High-quality outpainting with Controlnet + ZoeDepth
  • stable video diffusion support
  • Batch API support and EventDriven Queue Support
  • Logging and Telemetry using LogFire

πŸ— Architecture

image

Current Pipeline

  1. Object Detection: YOLOv8l provides accurate bounding boxes
  2. Segmentation: Segment Anything VIT Huge creates precise masks with ROI extension
  3. Outpainting: Controlnet Zoe Depth + Realistic Vision XL
  4. I2V GenXL: Image to Video Generation

🧠 Models used

πŸ“Έ Results

Here are some impressive results from our pipeline:

Cooker Output Toaster Output Chair Output
Tent Output Cycle Output

πŸ“Š Experimentation & Improvements

For detailed insights into our experimentation process, check out our Weights & Biases Report.

Recent improvements:

  • βœ… Deployed model as an API for batch processing
  • βœ… Implemented UI using Gradio
  • βœ… Integrated image-to-video model pipeline using Stable Video Diffusion

πŸŽ₯ Sample Video

Check out our short demo video to see PicPilot in action:

https://github.com/VikramxD/product_diffusion_api/assets/72499426/c935ec2d-cb76-49dd-adae-8aa4feac211e


πŸ“„ License: MIT