microsoft/Phi-4-multimodal-instruct Automatic Speech Recognition β’ Updated 9 days ago β’ 688k β’ 1.3k
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper β’ 2502.14786 β’ Published Feb 20 β’ 142