optical_encoder.* in the checkpoint is only 5 tensors — full DINOv3 or a stand-in?

by Salmanalfarisi1 - opened 1 day ago

Thanks for releasing the pretrained SAR encoder directly — big help.

Quick question while integrating it into a SAR-optical VLM project: the
optical_encoder.* weights bundled in SARMAE_vitb_checkpoint-last are only 5
tensors (a patch-embed conv + norm + cls_token, matching the
SimplifiedDINOv3 class in mae_contrastive.py on GitHub), not a full DINOv3
ViT-B transformer. The paper describes the optical branch as a full frozen
DINOv3.

Was this lightweight version what actually produced the paper's SARC
results, or a simplified stand-in for the public release? Also opened this
as a GitHub issue with more detail: https://github.com/MiliLab/SARMAE/issues/9. Thanks again for the weights!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment