prithivMLmods/Multimodal-VLM-v1.0 · Friendly note: ViLaSR relies on thinking with images, but the demo uses text-only reasoning

6 days ago

Hi!

I'm Junfei Wu, the author of ViLaSR. Thanks a lot for integrating our model into this demo — really appreciate your effort in making it accessible to the community!

Just a friendly note: ViLaSR relies on thinking with images, meaning it performs multiple drawing operations on the input image during reasoning. The current demo only supports text-only reasoning, which doesn't reflect the full behavior of the model and may underrepresent its capabilities.

To avoid confusion, it might be helpful to add a short disclaimer about this limitation. For the complete experience, we encourage researchers to check out the official ViLaSR repo for proper usage and evaluation.

Thanks again for featuring our work!

Best,
Junfei

prithivMLmods

Owner 6 days ago

•

edited 6 days ago

Hello, thanks for the indication. @Hyperwjf
I’ll update the changes soon from my end, or feel free to make a PR.

Thank you!

Best,
Prithiv

prithivMLmods changed discussion status to closed 6 days ago