YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

PortraitCraft Challenge Track 2: Portrait Composition Generation - Top 1 Solution

This repository contains the Top 1 solution code for the PortraitCraft Challenge Track 2: Portrait Composition Generation.

Our solution leverages a powerful diffusion pipeline built upon Z-Image-Turbo, enhanced with ControlNet for precise pose conditioning and fine-tuned using DPO-LoRA.


πŸš€ Getting Started

1. Model Preparation

Before running the code, you need to download the required base models and ControlNet models into the models directory. You can choose to download them from either Hugging Face or ModelScope.

From Hugging Face:

  • Tongyi-MAI/Z-Image-Turbo
  • alibaba-pai/Z-Image-Turbo-Fun-Controlnet-Union-2.1

From ModelScope:

  • Tongyi-MAI/Z-Image-Turbo
  • PAI/Z-Image-Turbo-Fun-Controlnet-Union-2.1

Ensure the downloaded weights are placed correctly so the scripts can load them properly.


2. Inference

To generate portrait images based on the provided JSON tasks (which include text prompts and control pose prompt), run the evaluation script:

python inference/eval.py

The script will load the base model, ControlNet, and the trained LoRA weights to generate the final submitted images.


3. Calculate Parameters

If you want to check the parameter count of the complete model pipeline (including DiT, Text Encoder, VAE, ControlNet, and LoRA), run:

python inference/calculate_params.py

Expected Output:

=== Components Parameter Count ===
DiT (Transformer): 6.1549 B
Text Encoder: 4.0225 B
VAE (Encoder + Decoder): 0.0838 B
ControlNet: 3.3562 B
LoRA (Included in DiT): 0.1588 B

=== Parameter Summary ===
Total Model Parameters: 13.6174 B
Total Model Parameters: 13,617,422,307

(Note: During inference, the LoRA weights are fused directly into the DiT transformer backbone to avoid extra computation overhead. Thus, the total parameter count remains highly efficient.)


4. Training

We use a DPO-based LoRA training strategy combined with ControlNet to align the model with human preferences and improve composition quality.

To start the training process, execute the provided bash script:

bash train/train_ai4va_pose_controlnet_dpo_lora_8x1.sh
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support