YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Model Introduction
We introduce OrthoTryOn, a unified and parameter-efficient framework for fashion image generation, designed to mitigate inter-task interference in shared adaptation and enable high-quality virtual try-on, garment reconstruction, and pose transfer within a single model. Its plug-and-play design can further extend to broader multi-task scenarios.
Highlights
- πConflict-Free Unified Generation: OrthoTryOn supports virtual try-on, garment reconstruction, and pose transfer within a single shared model, avoiding the deployment cost of multiple task-specific adapters.
- πGeometric Task Decoupling: Our Orthogonal Subspace Projection introduces task-specific orthogonal rotations into the shared LoRA bottleneck, reducing destructive gradient interference while preserving effective cross-task knowledge sharing.
- πFisher-Guided Inference: Our parameter-free Fisher-guided Negative Guidance identifies and suppresses the most confusable task during inference, mitigating residual semantic leakage and improving generation fidelity.
Showcase
Quick Start
Installation
conda create -n orthotryon python=3.10 -y
conda activate orthotryon
pip install --upgrade pip
pip install -r requirements.txt
Data Preparation
Virtual Try-On
Download the VITON-HD dataset from the official repository.
To obtain garment-removed reference-person images, we follow the image synthesis protocol of Any2AnyTryon. These synthesized reference images are stored under edited_image/.
We extract human skeleton maps with MMPose using the HRNet-W48 COCO top-down model:
Config: td-hm_hrnet-w48_8xb32-210e_coco-256x192
Checkpoint: td-hm_hrnet-w48_8xb32-210e_coco-256x192-0e67c616_20220913.pth
We generate garment-specific editing instructions using Qwen2.5-VL-7B-Instruct. During training annotation generation, the first image is the edited image and the second image is the ground-truth target image. We use the following prompt:
You are an expert fashion editor. You are provided with two images:
- The first image is the Source Image (Before).
- The second image is the Target Image (After).
Analyze the difference in the clothing between the Source and the Target. Provide a single, concise imperative editing instruction to transform the Source into the Target.
Examples:
- βChange the blue denim jacket to a red silk blouse.β
- βAdd a black leather belt to the dress.β
- βRemove the graphic logo from the t-shirt.β
- βChange the white long-sleeved shirt to a red-and-white striped short-sleeved shirt.β
Focus only on the garment changes. Output ONLY the instruction sentence.
At test time, users may separately generate a detailed description of the target garment. This description can replace the target-garment-related content in the editing instruction and should be saved under edit_instructs/. Pre-generated instruction annotations are available from here.
<VITONHD_ROOT>/
βββ test_pairs.txt
βββ test/
βββ image/
βββ edited_image/
βββ cloth/
βββ mmpose_skeleton/
βββ edit_instructs/
βββ paired_edit_instructs/ # Only for paired VTON testing
Garment Reconstruction (VTOFF)
VTOFF uses the same VITON-HD dataset and directory structure described above.
Pose Transfer
Download the DeepFashion dataset from the official project page. Follow the same MMPose HRNet-W48 procedure used for VTON to extract target-pose skeleton maps, and save them under test_skeleton/.
<DEEPFASHION_ROOT>/
βββ fasion-resize-pairs-test.csv
βββ test_highres/
βββ test_skeleton/
Model Download
Download the LongCat-Image-Edit base model and the OrthoTryOn LoRA checkpoint from Hugging Face.
Run Virtual Try-On
python scripts/inference_ortho.py \
--task vton \
--vton_root_dir /path/to/VITONHD \
--test_pairs_file test_pairs.txt \
--setting unpaired/paired \
--model_path checkpoints/LongCat-Image-Edit \
--lora_path checkpoints/OrthoTryOn \
--sampler cfg
Optional VTON Refinement
For background-preserving refinement, use refine.py together with the agnostic mask to repaint non-garment regions with the original ground-truth image.
Run Garment Reconstruction
python scripts/inference_ortho.py \
--task vtoff \
--vtoff_root_dir /path/to/VITONHD \
--test_pairs_file test_pairs.txt \
--model_path checkpoints/LongCat-Image-Edit \
--lora_path checkpoints/OrthoTryOn \
--sampler cfg
Run Pose Transfer
python scripts/inference_ortho.py \
--task pose \
--pose_root_dir /path/to/DeepFashion \
--csv_file fasion-resize-pairs-test.csv \
--model_path checkpoints/LongCat-Image-Edit \
--lora_path checkpoints/OrthoTryOn \
--sampler cfg
For Fisher-guided Negative Guidance (FNG), the unified inference interface supports vton_cfg, vtoff_cfg, and pose_cfg samplers. When using one of these samplers, please also provide the corresponding dataset root path so that the required conditional inputs can be loaded properly, e.g., --vton_root_dir, --vtoff_root_dir, and --pose_root_dir.
Training Pipeline
Training data preparation follows the same procedure as inference. The expected training inputs are:
Expected VTON Training Inputs
<VITONHD_ROOT>/
βββ train_dataset.jsonl
βββ train/
βββ image/
βββ edited_image/
βββ cloth/
βββ mmpose_skeleton/
βββ edit_instructs/
The train_dataset.jsonl file can be downloaded from here.
Expected Pose Transfer Training Inputs
<DEEPFASHION_ROOT>/
βββ fasion-resize-pairs-train.csv
βββ train_highres/
βββ train_skeleton/
Example Training Command
accelerate launch \
--config_file misc/accelerate_config.yaml \
train_examples/edit_lora/train_ortho.py \
--config configs/train_ortho.yaml
# or you can run the bash scripts
bash train_examples/edit_lora/train.sh
Before training, update the dataset paths in train_examples/edit_lora/train_config.yaml:
vton_root_path: /path/to/VITONHD
vtoff_root_path: /path/to/VITONHD
pose_root_path: /path/to/DeepFashion
When using a locally downloaded LongCat-Image-Edit checkpoint, also update pretrained_model_name_or_path to the corresponding local checkpoint directory.
Acknowledgements
This project is built upon the open research and engineering efforts of:
We thank the respective authors and contributors for making their models, datasets, and tooling available to the research community.
Citation
If you find this repository useful, please cite the corresponding paper after publication.
@misc{yang2026orthotryon,
title={OrthoTryOn: Geometric Orthogonalization for Conflict-Free Unified Fashion Generation},
author={Zhaotong Yang and Ying Tai and Jiahui Zhan and Yu Zheng and Jianjun Qian and Jian Yang},
year={2026},
eprint={2606.27880},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2606.27880},
}