microsoft/Phi-4-multimodal-instruct Automatic Speech Recognition β’ Updated about 4 hours ago β’ 767k β’ 1.23k
Running 543 543 Vision Arena (Testing VLMs side-by-side) πΌ Analyze images to detect and label objects