YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
PortraitCraft Challenge Track 2: Portrait Composition Generation - Top 1 Solution
This repository contains the Top 1 solution code for the PortraitCraft Challenge Track 2: Portrait Composition Generation.
Our solution leverages a powerful diffusion pipeline built upon Z-Image-Turbo, enhanced with ControlNet for precise pose conditioning and fine-tuned using DPO-LoRA.
π Getting Started
1. Model Preparation
Before running the code, you need to download the required base models and ControlNet models into the models directory. You can choose to download them from either Hugging Face or ModelScope.
From Hugging Face:
Tongyi-MAI/Z-Image-Turboalibaba-pai/Z-Image-Turbo-Fun-Controlnet-Union-2.1
From ModelScope:
Tongyi-MAI/Z-Image-TurboPAI/Z-Image-Turbo-Fun-Controlnet-Union-2.1
Ensure the downloaded weights are placed correctly so the scripts can load them properly.
2. Inference
To generate portrait images based on the provided JSON tasks (which include text prompts and control pose prompt), run the evaluation script:
python inference/eval.py
The script will load the base model, ControlNet, and the trained LoRA weights to generate the final submitted images.
3. Calculate Parameters
If you want to check the parameter count of the complete model pipeline (including DiT, Text Encoder, VAE, ControlNet, and LoRA), run:
python inference/calculate_params.py
Expected Output:
=== Components Parameter Count ===
DiT (Transformer): 6.1549 B
Text Encoder: 4.0225 B
VAE (Encoder + Decoder): 0.0838 B
ControlNet: 3.3562 B
LoRA (Included in DiT): 0.1588 B
=== Parameter Summary ===
Total Model Parameters: 13.6174 B
Total Model Parameters: 13,617,422,307
(Note: During inference, the LoRA weights are fused directly into the DiT transformer backbone to avoid extra computation overhead. Thus, the total parameter count remains highly efficient.)
4. Training
We use a DPO-based LoRA training strategy combined with ControlNet to align the model with human preferences and improve composition quality.
To start the training process, execute the provided bash script:
bash train/train_ai4va_pose_controlnet_dpo_lora_8x1.sh