Qwen-Image-Layered-Control / README_from_modelscope.md

kelseye

Upload folder using huggingface_hub

b42868f verified 4 days ago

preview code

raw

history blame contribute delete

4.45 kB

metadata

frameworks: PyTorch
license: Apache License 2.0
tags: []
tasks:
  - text-to-image-synthesis
base_model:
  - Qwen/Qwen-Image-Layered
base_model_relation: finetune

Qwen-Image-Layered

模型介绍

本模型基于模型 Qwen/Qwen-Image-Layered 在数据集 artplus/PrismLayersPro 上进行了训练，可以通过文本控制拆分的图层内容。

更多关于训练策略和实现细节，欢迎查看我们的技术博客。

使用技巧

模型结构从多图输出改为了单图输出，仅输出与文本描述相关的图层
模型只用英文文本训练过，但仍从基础模型继承了中文理解能力
模型训练的原生分辨率是1024x1024，支持以其他分辨率进行推理
模型难以拆分“互相遮挡”的多个实体，例如样例中的卡通骷髅头和帽子
模型擅长拆分海报图层，不擅长拆分摄影图像，尤其是存在光影的照片
模型支持负向提示词，可以通过负向提示词描述不希望出现在结果的内容

效果展示

部分图片为纯白色文本，魔搭社区用户请点击页面右上角的“☀︎”切换到暗色模式

样例1

输入图

提示词	输出图	提示词	输出图
A solid, uniform color with no distinguishable features or objects		Text 'TRICK'
Cloud		Text 'TRICK OR TREAT'
A cartoon skeleton character wearing a purple hat and holding a gift box		Text 'TRICK OR'
A purple hat and a head		A gift box

样例2

输入图

提示词	输出图	提示词	输出图
蓝天，白云，一片花园，花园里有五颜六色的花		五彩的精致花环
少女、花环、小猫		少女、小猫

样例3

输入图

提示词	输出图	提示词	输出图
一片湛蓝的天空和波涛汹涌的大海		文字“向往的生活”
一只海鸥		文字“生活”

推理代码

安装 DiffSynth-Studio：

git clone https://github.com/modelscope/DiffSynth-Studio.git  
cd DiffSynth-Studio
pip install -e .

模型推理：

from diffsynth.pipelines.qwen_image import QwenImagePipeline, ModelConfig
from PIL import Image
import torch, requests

pipe = QwenImagePipeline.from_pretrained(
    torch_dtype=torch.bfloat16,
    device="cuda",
    model_configs=[
        ModelConfig(model_id="DiffSynth-Studio/Qwen-Image-Layered-Control", origin_file_pattern="transformer/diffusion_pytorch_model*.safetensors"),
        ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="text_encoder/model*.safetensors"),
        ModelConfig(model_id="Qwen/Qwen-Image-Layered", origin_file_pattern="vae/diffusion_pytorch_model.safetensors"),
    ],
    processor_config=ModelConfig(model_id="Qwen/Qwen-Image-Edit", origin_file_pattern="processor/"),
)
prompt = "A cartoon skeleton character wearing a purple hat and holding a gift box"
input_image = requests.get("https://modelscope.oss-cn-beijing.aliyuncs.com/resource/images/trick_or_treat.png", stream=True).raw
input_image = Image.open(input_image).convert("RGBA").resize((1024, 1024))
input_image.save("image_input.png")
images = pipe(
    prompt,
    seed=0,
    num_inference_steps=30, cfg_scale=4,
    height=1024, width=1024,
    layer_input_image=input_image,
    layer_num=0,
)
images[0].save("image.png")