Spaces:
Running
Running
metadata
license: mit
sdk: docker
emoji: π
colorFrom: blue
colorTo: green
pinned: false
short_description: PicPilot Production Server
π PicPilot
PicPilot: Generate Stunning Photography and Craft Visual Narratives in seconds for your Brand
π Overview
PicPilot is a scalable solution that leverages state-of-the-art Text to Image Models to extend and enhance images and create product photography This project has evolved through multiple iterations, addressing challenges and improving output quality at each stage.
Key Features:
- segmentation using Segment Anything VIT Huge and YOLOv8s
- High-quality outpainting with Controlnet + ZoeDepth
- stable video diffusion support
- Batch API support and EventDriven Queue Support
- Logging and Telemetry using LogFire
π Architecture
Current Pipeline
- Object Detection: YOLOv8l provides accurate bounding boxes
- Segmentation: Segment Anything VIT Huge creates precise masks with ROI extension
- painting: Controlnet Zoe Depth + Realistic Vision XL
- I2V GenXL: Image to Video Generation Pipeline
π§ Models used
- Stable Diffusion Inpainting
- Kandinsky 2.2 Decoder Inpaint
- Stable Diffusion XL
- Controlnet-Inpaint Dreamer
- Controlnet Zoe Depth
- Realistic Vision XL
- I2V GenXL
πΈ Results
Here are some impressive results from our pipeline:
π Experimentation & Improvements
For detailed insights into our experimentation process, check out our Weights & Biases Report.
Recent improvements:
- β Deployed model as an API for batch processing
- β Implemented UI using Gradio
- β Integrated image-to-video model pipeline using I2V GenxL
π₯ Sample Video
Check out our short demo video to see PicPilot in action:
π License: MIT