Unicorn: Text-Only Data Synthesis for Vision Language Model Training
Paper
•
2503.22655
•
Published
•
37
This is related to one of my papers. If you are interested, we can discuss the details via email. 😄
"When using the standalone GemmaTokenizerFast make sure to pass padding="max_length" and max_length=64 as that’s how the model was trained." Does Siglip2 support longer text input? If the max_length is set to 256 or 512, will text exceeding 64 be truncated?