How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers Paper • 2106.10270 • Published Jun 18, 2021 • 1
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing Paper • 2404.12253 • Published about 1 month ago • 51
Many-Shot In-Context Learning in Multimodal Foundation Models Paper • 2405.09798 • Published 3 days ago • 22
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection Paper • 2405.10300 • Published 3 days ago • 16
SigLIP Collection Contrastive (sigmoid) image-text models from https://arxiv.org/abs/2303.15343 • 8 items • Updated 5 days ago • 24
PaliGemma Release Collection Pretrained and mix checkpoints for PaliGemma • 11 items • Updated 2 days ago • 89
Stylus: Automatic Adapter Selection for Diffusion Models Paper • 2404.18928 • Published 20 days ago • 14
Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models Paper • 2404.18796 • Published 20 days ago • 62
STT: Stateful Tracking with Transformers for Autonomous Driving Paper • 2405.00236 • Published 19 days ago • 7
Is Bigger Edit Batch Size Always Better? -- An Empirical Study on Model Editing with Llama-3 Paper • 2405.00664 • Published 18 days ago • 16
Self-Play Preference Optimization for Language Model Alignment Paper • 2405.00675 • Published 18 days ago • 18
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report Paper • 2405.00732 • Published 20 days ago • 107
Canary Collection A collection of multilingual and multitask speech to text models from NVIDIA NeMo 🐤 • 1 item • Updated Feb 19 • 14
Awesome Document AI Collection A collection of open-source document AI 📄 📝 📈 • 27 items • Updated Mar 11 • 38
Foundation Models for Vision 🧩 Collection Foundation models for computer vision. • 24 items • Updated Mar 11 • 16
Computer Vision Backbones 🧩 Collection Collection of useful computer vision backbones to fine-tune. It also includes large image classification models, that can be used as backbone. • 22 items • Updated Sep 19, 2023 • 11
Vision Language Models Papers 🖼️💬📝 Collection Papers about vision-language models, most important ones are on top of the list. • 27 items • Updated 19 days ago • 26
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models Paper • 2405.01535 • Published 17 days ago • 92
Gemma release Collection Groups the Gemma models released by the Google team. • 40 items • Updated 5 days ago • 303
Notebooks Collection A collection of notebooks demonstrating PEFT methods applied to various tasks • 9 items • Updated Jan 9 • 5
Model Merging Collection Model Merging is a very popular technique nowadays in LLM. Here is a chronological list of papers on the space that will help you get started with it! • 28 items • Updated Mar 23 • 180
BRAVE: Broadening the visual encoding of vision-language models Paper • 2404.07204 • Published Apr 10 • 14
Best Practices and Lessons Learned on Synthetic Data for Language Models Paper • 2404.07503 • Published Apr 11 • 25
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments Paper • 2404.07972 • Published Apr 11 • 40
TextHawk: Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models Paper • 2404.09204 • Published Apr 14 • 10
Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model Paper • 2404.09967 • Published Apr 15 • 20
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone Paper • 2404.14219 • Published 27 days ago • 230
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models Paper • 2404.13013 • Published 30 days ago • 26
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences Paper • 2404.03715 • Published Apr 4 • 57
Gecko: Versatile Text Embeddings Distilled from Large Language Models Paper • 2403.20327 • Published Mar 29 • 43
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models Paper • 2403.18814 • Published Mar 27 • 37
GiT: Towards Generalist Vision Transformer through Universal Language Interface Paper • 2403.09394 • Published Mar 14 • 25
Awesome feedback datasets Collection A curated list of datasets with human or AI feedback. Useful for training reward models or applying techniques like DPO. • 19 items • Updated Apr 12 • 53
Awesome SFT datasets Collection A curated list of interesting datasets to fine-tune language models with. • 43 items • Updated Apr 12 • 89
Leaderboards and benchmarks ✨ Collection Cool leaderboard spaces collection for models across modalities! Text, vision, audio, ... • 61 items • Updated 5 days ago • 59
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection Paper • 2403.03507 • Published Mar 6 • 172
Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research Paper • 2402.00159 • Published Jan 31 • 55
Self-Discover: Large Language Models Self-Compose Reasoning Structures Paper • 2402.03620 • Published Feb 6 • 102
World Model on Million-Length Video And Language With RingAttention Paper • 2402.08268 • Published Feb 13 • 33