metadata

license: mit
sdk: docker
emoji: 🚀
colorFrom: blue
colorTo: green
pinned: false
short_description: PicPilot Production Server

🚀 PicPilot

PicPilot: Generate Stunning Photography and Craft Visual Narratives in seconds for your Brand

📖 Overview

PicPilot is a scalable solution that leverages state-of-the-art Text to Image Models to extend and enhance images and create product photography This project has evolved through multiple iterations, addressing challenges and improving output quality at each stage.

Key Features:

segmentation using Segment Anything VIT Huge and YOLOv8s
High-quality outpainting with Controlnet + ZoeDepth
stable video diffusion support
Batch API support and EventDriven Queue Support
Logging and Telemetry using LogFire

🏗 Architecture

Current Pipeline

Object Detection: YOLOv8l provides accurate bounding boxes
Segmentation: Segment Anything VIT Huge creates precise masks with ROI extension
painting: Controlnet Zoe Depth + Realistic Vision XL
I2V GenXL: Image to Video Generation Pipeline

🧠 Models used

📸 Results

Here are some impressive results from our pipeline:

📊 Experimentation & Improvements

For detailed insights into our experimentation process, check out our Weights & Biases Report.

Recent improvements:

✅ Deployed model as an API for batch processing
✅ Implemented UI using Gradio
✅ Integrated image-to-video model pipeline using I2V GenxL

🎥 Sample Video

Check out our short demo video to see PicPilot in action:

https://github.com/VikramxD/product_diffusion_api/assets/72499426/c935ec2d-cb76-49dd-adae-8aa4feac211e

📄 License: MIT