Adam Optimizer with a constant learning rate 1e-5 for 4000 steps training (batch_size=128). Only the vision encoder is fine-tuned.
Test set accuracy:
Base model