optical_encoder.* in the checkpoint is only 5 tensors — full DINOv3 or a stand-in?
#1
by Salmanalfarisi1 - opened
Thanks for releasing the pretrained SAR encoder directly — big help.
Quick question while integrating it into a SAR-optical VLM project: the
optical_encoder.* weights bundled in SARMAE_vitb_checkpoint-last are only 5
tensors (a patch-embed conv + norm + cls_token, matching the
SimplifiedDINOv3 class in mae_contrastive.py on GitHub), not a full DINOv3
ViT-B transformer. The paper describes the optical branch as a full frozen
DINOv3.
Was this lightweight version what actually produced the paper's SARC
results, or a simplified stand-in for the public release? Also opened this
as a GitHub issue with more detail: https://github.com/MiliLab/SARMAE/issues/9. Thanks again for the weights!