Spaces:
Runtime error
Runtime error
metadata
license: mit
sdk: docker
emoji: π
colorFrom: blue
colorTo: green
pinned: false
short_description: PicPilot Production Server
π PicPilot
PicPilot: Generate Stunning Photography and Craft Visual Narratives in seconds for your Brand
π Overview
PicPilot is a scalable solution that leverages state-of-the-art Text to Image Models to extend and enhance images. This project has evolved through multiple iterations, addressing challenges and improving output quality at each stage.
Key Features:
- segmentation using Segment Anything VIT Huge and YOLOv8s
- High-quality outpainting with Controlnet + ZoeDepth
- stable video diffusion support
- Batch API support and EventDriven Queue Support
- Logging and Telemetry using LogFire
π Architecture
Current Pipeline
- Object Detection: YOLOv8l provides accurate bounding boxes
- Segmentation: Segment Anything VIT Huge creates precise masks with ROI extension
- Outpainting: Controlnet Zoe Depth + Realistic Vision XL
- I2V GenXL: Image to Video Generation
π§ Models used
- Stable Diffusion Inpainting
- Kandinsky 2.2 Decoder Inpaint
- Stable Diffusion XL
- Controlnet-Inpaint Dreamer
- Controlnet Zoe Depth
- Realistic Vision XL
- I2V GenXL
πΈ Results
Here are some impressive results from our pipeline:
π Experimentation & Improvements
For detailed insights into our experimentation process, check out our Weights & Biases Report.
Recent improvements:
- β Deployed model as an API for batch processing
- β Implemented UI using Gradio
- β Integrated image-to-video model pipeline using Stable Video Diffusion
π₯ Sample Video
Check out our short demo video to see PicPilot in action:
π License: MIT