Vision - a Ambroser53 Collection

Ambroser53 's Collections

Embed

LoRA

Vision

Speech

active learning

SSM

RL

TTS

context

Vision

updated Jul 22, 2024

InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with Instructions

Paper • 2401.13313 • Published Jan 24, 2024 • 5
BAAI/Bunny-v1_0-4B

Text Generation • Updated Jun 24, 2024 • 162 • 9
What matters when building vision-language models?

Paper • 2405.02246 • Published May 3, 2024 • 102
Jina CLIP: Your CLIP Model Is Also Your Text Retriever

Paper • 2405.20204 • Published May 30, 2024 • 35
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

Paper • 2401.09417 • Published Jan 17, 2024 • 61
VoCo-LLaMA: Towards Vision Compression with Large Language Models

Paper • 2406.12275 • Published Jun 18, 2024 • 31
PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents

Paper • 2406.13923 • Published Jun 20, 2024 • 23
Instruction Pre-Training: Language Models are Supervised Multitask Learners

Paper • 2406.14491 • Published Jun 20, 2024 • 90
ColPali: Efficient Document Retrieval with Vision Language Models

Paper • 2407.01449 • Published Jun 27, 2024 • 45
VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding

Paper • 2407.12594 • Published Jul 17, 2024 • 19