Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,106 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
datasets:
|
| 4 |
+
- peteromallet/high-quality-midjouney-srefs
|
| 5 |
+
base_model:
|
| 6 |
+
- Qwen/Qwen-Image-Edit
|
| 7 |
+
tags:
|
| 8 |
+
- image
|
| 9 |
+
- editing
|
| 10 |
+
- lora
|
| 11 |
+
- scene-generation
|
| 12 |
+
- qwen
|
| 13 |
+
pipeline_tag: image-to-image
|
| 14 |
+
library_name: diffusers
|
| 15 |
+
---
|
| 16 |
+
|
| 17 |
+
# QwenEdit InScene LoRAs (Beta)
|
| 18 |
+
|
| 19 |
+
## Model Description
|
| 20 |
+
|
| 21 |
+
**InScene** and **InScene Annotate** are a pair of LoRA fine-tunes for QwenEdit that enhance its ability to generate images based on scene references. These models work together to provide flexible scene-based image generation with optional annotation support.
|
| 22 |
+
|
| 23 |
+
### InScene
|
| 24 |
+
The main model that generates images based on scene composition and layout from a reference image. InScene is trained on pairs of different shots within the same scene, along with prompts describing the desired output. Its goal is to create entirely new shots within a scene while maintaining character consistency and scene coherence.
|
| 25 |
+
|
| 26 |
+
InScene is intentionally biased towards creating completely new shots rather than minor edits. This design choice overcomes Qwen-Image-Edit's internal bias toward making small, conservative edits, enabling more dramatic scene transformations while preserving the characters and overall scene identity.
|
| 27 |
+
|
| 28 |
+

|
| 29 |
+
|
| 30 |
+
### InScene Annotate
|
| 31 |
+
InScene Annotate is trained on images with green rectangles drawn over specific regions. The model learns to generate images showing the subject within that green rectangle area. Rather than simply zooming in precisely on the marked region, it's trained to flexibly interpret instructions to show what's inside that area - capturing the subject, context, and framing in a more natural, composed way rather than a strict crop.
|
| 32 |
+
|
| 33 |
+

|
| 34 |
+
|
| 35 |
+
*InScene and InScene Annotate are currently in beta.*
|
| 36 |
+
|
| 37 |
+
## How to Use
|
| 38 |
+
|
| 39 |
+
### InScene
|
| 40 |
+
To use the base InScene model, start your prompt with:
|
| 41 |
+
|
| 42 |
+
`Make an image in this scene of `
|
| 43 |
+
|
| 44 |
+
And then describe what you want to generate.
|
| 45 |
+
|
| 46 |
+
For example:
|
| 47 |
+
`Make an image in this scene of a bustling city street at night.`
|
| 48 |
+
|
| 49 |
+
### InScene Annotate
|
| 50 |
+
For the annotate variant, use annotated reference images and start your prompt with:
|
| 51 |
+
|
| 52 |
+
`Based on this annotated scene, create `
|
| 53 |
+
|
| 54 |
+
For example:
|
| 55 |
+
`Based on this annotated scene, create a winter landscape with snow-covered mountains.`
|
| 56 |
+
|
| 57 |
+
### Use with diffusers
|
| 58 |
+
|
| 59 |
+
**InScene:**
|
| 60 |
+
```
|
| 61 |
+
import torch
|
| 62 |
+
from diffusers import QwenImageEditPipeline
|
| 63 |
+
|
| 64 |
+
pipe = QwenImageEditPipeline.from_pretrained("Qwen/Qwen-Image-Edit", torch_dtype=torch.bfloat16)
|
| 65 |
+
pipe.to("cuda")
|
| 66 |
+
|
| 67 |
+
pipe.load_lora_weights("peteromallet/Qwen-Image-Edit-InScene", weight_name="InScene-0.7.safetensors")
|
| 68 |
+
```
|
| 69 |
+
|
| 70 |
+
**InScene Annotate:**
|
| 71 |
+
```
|
| 72 |
+
import torch
|
| 73 |
+
from diffusers import QwenImageEditPipeline
|
| 74 |
+
|
| 75 |
+
pipe = QwenImageEditPipeline.from_pretrained("Qwen/Qwen-Image-Edit", torch_dtype=torch.bfloat16)
|
| 76 |
+
pipe.to("cuda")
|
| 77 |
+
|
| 78 |
+
pipe.load_lora_weights("peteromallet/Qwen-Image-Edit-InScene", weight_name="InScene-Annotate-0.7.safetensors")
|
| 79 |
+
```
|
| 80 |
+
|
| 81 |
+
### Strengths & Weaknesses
|
| 82 |
+
|
| 83 |
+
The models excel at:
|
| 84 |
+
- Capturing scene composition and spatial layout from reference images
|
| 85 |
+
- Maintaining consistent scene structure while varying content
|
| 86 |
+
- Understanding spatial relationships between elements
|
| 87 |
+
- Strong prompt adherence with scene-aware generation
|
| 88 |
+
- (Annotate) Precise control using annotated references
|
| 89 |
+
|
| 90 |
+
The models may struggle with:
|
| 91 |
+
- Very complex multi-layered scenes with numerous elements
|
| 92 |
+
- Extremely abstract or non-traditional scene compositions
|
| 93 |
+
- Fine-grained details that conflict with the reference scene layout
|
| 94 |
+
- Occasional depth perception issues
|
| 95 |
+
|
| 96 |
+
## Training Data
|
| 97 |
+
|
| 98 |
+
The InScene and InScene Annotate LoRAs were trained on a curated dataset of high-quality Midjourney style references, with additional scene-focused annotations for the Annotate variant. The dataset emphasizes diverse scene compositions and spatial relationships.
|
| 99 |
+
|
| 100 |
+
You can find the public dataset used for training here:
|
| 101 |
+
[https://huggingface.co/datasets/peteromallet/high-quality-midjouney-srefs](https://huggingface.co/datasets/peteromallet/high-quality-midjouney-srefs)
|
| 102 |
+
|
| 103 |
+
## Links
|
| 104 |
+
|
| 105 |
+
- Model: [https://huggingface.co/peteromallet/Qwen-Image-Edit-InScene](https://huggingface.co/peteromallet/Qwen-Image-Edit-InScene)
|
| 106 |
+
- Dataset: [https://huggingface.co/datasets/peteromallet/high-quality-midjouney-srefs](https://huggingface.co/datasets/peteromallet/high-quality-midjouney-srefs)
|