license: apache-2.0
Pretrain and finetune weights for CVPR'25 Paper "Crab: A Unified Audio-Visual Scene Understanding Model with Explicit Cooperation"