Readout Head Weights

The weights/ folder contains the pre-trained weights of the readout heads, named according to the following convention:

readout_<base-model>_<task-type>_<head-type>

Spatially Aligned Control

readout_sdxl_spatial_pose.pt
readout_sdv15_spatial_pose.pt
- Readout head trained with OpenPose pose skeletons as supervision on PascalVOC images, filtered only to those containing people.
readout_sdxl_spatial_depth.pt
readout_sdv15_spatial_depth.pt
- Readout head trained with MiDaS depth maps as supervision on PascalVOC images.
readout_sdxl_spatial_edge.pt
readout_sdv15_spatial_edge.pt
- Readout head trained with HED edge detections as supervision on PascalVOC images.

Drag-Based Manipulation

readout_sdxl_drag_correspondence.pt
readout_sdv15_drag_correspondence.pt
- Readout head trained with a contrastive loss with CoTracker point tracks across pairs of DAVIS video frames.
readout_sdxl_drag_appearance.pt
readout_sdv15_drag_appearance.pt
- Readout head trained with a triplet loss with real frames as positives and SDEdit-ed frames as negatives derived from DAVIS videos.