Molmo Collection Artifacts for open multimodal language models. • 5 items • Updated 16 days ago • 284
ViDoRe Benchmark Collection Benchmark for document retrieval using visual features, introduced in the ColPali paper. Datasets are using the QA format. • 10 items • Updated 5 days ago • 11
view article Article Powerful ASR + diarization + speculative decoding with Hugging Face Inference Endpoints May 1 • 69
view article Article seemore: Implement a Vision Language Model from Scratch By AviSoori1x • Jun 23 • 69