AAAI26-ReTrack / README.md
Lee-zixu's picture
Update README.md
82b3a15 verified
metadata
license: apache-2.0
task_categories:
  - video-retrieval
  - image-retrieval
tags:
  - composed-video-retrieval
  - composed-image-retrieval
  - vision-language
  - pytorch
  - aaai-2026

🎬 (AAAI 2026) ReTrack: Evidence-Driven Dual-Stream Directional Anchor Calibration Network for Composed Video Retrieval (Model Weights)

1School of Software, Shandong University    
2School of Computer Science and Technology, Shandong Jianzhu University   
βœ‰ Corresponding author  

AAAI 2026 Paper Project Page GitHub

This repository hosts the official pre-trained model weights for ReTrack, an evidence-driven framework designed to calibrate directional bias in composed features for both Composed Video Retrieval (CVR) and Composed Image Retrieval (CIR) tasks.


πŸ“Œ Model Information

1. Model Name

ReTrack (Evidence-Driven Dual-Stream Directional Anchor Calibration Network) Checkpoints.

2. Task Type & Applicable Tasks

  • Task Type: Composed Video Retrieval (CVR) and Composed Image Retrieval (CIR).
  • Applicable Tasks: Retrieving a target video or image based on a reference visual input combined with a modification text prompt. The model significantly reduces uncertainty caused by highly similar retrieval candidates in multi-modal queries.

3. Project Introduction

ReTrack is an advanced open-source PyTorch framework built on top of BLIP-2 (via Salesforce LAVIS) that improves multi-modal query understanding. It features:

  • 🎯 Dual-Stream Directional Anchor Calibration: Explicitly identifies and calibrates visual and textual semantic contributions to resolve directional bias.
  • βš–οΈ Reliable Evidence-Driven Alignment: Leverages Dempster-Shafer Theory to evaluate similarity reliability, minimizing ambiguity among candidates.

4. Training Data Source & Hosted Weights

The framework is trained to support both the WebVid-CoVR dataset for video retrieval and the FashionIQ / CIRR datasets for image retrieval.

This Hugging Face repository provides the following pre-trained checkpoint:

  • πŸ“„ ReTrack-WebVid-Frame1.ckpt: The checkpoint trained on the WebVid-CoVR dataset (using a 1-frame configuration setting).

πŸš€ Usage & Basic Inference

These weights are designed to be evaluated using the highly modular, Hydra-configured ReTrack GitHub repository.

Step 1: Prepare the Environment

We recommend using Anaconda. Clone the repository and install dependencies:

git clone https://github.com/iLearn-Lab/AAAI26-ReTrack.git
cd ReTrack
conda create -n retrack python=3.8 -y
conda activate retrack
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
pip install -r requirements.txt

Step 2: Download Model Weights & Prepare Data

  1. Download ReTrack-WebVid-Frame1.ckpt from this Hugging Face repository.
  2. Place the checkpoint in the appropriate directory as expected by your Hydra configuration (e.g., within a checkpoints/ folder).
  3. Ensure the WebVid-CoVR dataset is placed under your defined datasets_dir in configs/machine/default.yaml.

Step 3: Run Evaluation

To evaluate the trained CVR model, use test.py and specify the path to your downloaded checkpoint via Hydra CLI overrides:

python test.py \
    model.ckpt_path=/path/to/your/ReTrack-WebVid-Frame1.ckpt \
    +test=webvid-covr

(Refer to the configs/ directory in the code repository for advanced hyperparameter and path adjustments).


⚠️ Limitations & Notes

  • Configuration: ReTrack is entirely managed by Hydra and Lightning Fabric. Make sure you are familiar with overriding configurations via the CLI or modifying the YAML files in the configs/ directory.
  • Environment: The project was specifically developed and evaluated on Python 3.8 and PyTorch 2.1.0; using drastically different versions may yield unexpected behaviors.

πŸ“β­οΈ Citation

If you find our framework, code, or these weights useful in your research, please consider leaving a Star ⭐️ on our GitHub repository and citing our AAAI 2026 paper:

@inproceedings{ReTrack,
  title={ReTrack: Evidence Driven Dual Stream Directional Anchor Calibration Network for Composed Video Retrieval},
  author={Li, Zixu and Hu, Yupeng and Chen, Zhiwei and Huang, Qinlei and Qiu, Guozhi and Fu, Zhiheng and Liu, Meng},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  year={2026}
}