Model Surgery: Modulating LLM's Behavior Via Simple Parameter Editing Paper • 2407.08770 • Published 11 days ago • 16
Seamless Communication Collection A significant step towards removing language barriers through expressive, fast and high-quality AI translation. • 16 items • Updated Jan 16 • 135
Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model Paper • 2407.07053 • Published 13 days ago • 37
view article Article Orchestration of Experts: The First-Principle Multi-Model System By alirezamsh • May 30 • 15
Towards Building Specialized Generalist AI with System 1 and System 2 Fusion Paper • 2407.08642 • Published 11 days ago • 9
LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs Paper • 2407.03963 • Published 18 days ago • 13
Understanding Visual Feature Reliance through the Lens of Complexity Paper • 2407.06076 • Published 14 days ago • 4
Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence Paper • 2407.07061 • Published 13 days ago • 23
Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision Paper • 2407.06189 • Published 14 days ago • 24
Do Vision and Language Models Share Concepts? A Vector Space Alignment Study Paper • 2302.06555 • Published Feb 13, 2023 • 7
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception Paper • 2407.08303 • Published 11 days ago • 17
AgentInstruct: Toward Generative Teaching with Agentic Flows Paper • 2407.03502 • Published 19 days ago • 34
view article Article Introducing NPC-Playground, a 3D playground to interact with LLM-powered NPCs Jun 5 • 14
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs Paper • 2406.16860 • Published 28 days ago • 52
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions Paper • 2406.15877 • Published 30 days ago • 43
Long Code Arena: a Set of Benchmarks for Long-Context Code Models Paper • 2406.11612 • Published Jun 17 • 20
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence Paper • 2406.11931 • Published Jun 17 • 54
Discovering Preference Optimization Algorithms with and for Large Language Models Paper • 2406.08414 • Published Jun 12 • 12
μLO: Compute-Efficient Meta-Generalization of Learned Optimizers Paper • 2406.00153 • Published May 31 • 9
Artificial Generational Intelligence: Cultural Accumulation in Reinforcement Learning Paper • 2406.00392 • Published Jun 1 • 12
Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration Paper • 2406.01014 • Published Jun 3 • 29
Block Transformer: Global-to-Local Language Modeling for Fast Inference Paper • 2406.02657 • Published Jun 4 • 36
LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes Paper • 2406.02897 • Published Jun 5 • 13
Audio Mamba: Bidirectional State Space Model for Audio Representation Learning Paper • 2406.03344 • Published Jun 5 • 16
Phi-3 Collection Phi-3 family of small language and multi-modal models. Language models are available in short- and long-context lengths. • 23 items • Updated 11 days ago • 372
How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers Paper • 2106.10270 • Published Jun 18, 2021 • 2
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing Paper • 2404.12253 • Published Apr 18 • 52
Many-Shot In-Context Learning in Multimodal Foundation Models Paper • 2405.09798 • Published May 16 • 26
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection Paper • 2405.10300 • Published May 16 • 25
SigLIP Collection Contrastive (sigmoid) image-text models from https://arxiv.org/abs/2303.15343 • 8 items • Updated 25 days ago • 31
PaliGemma Release Collection Pretrained and mix checkpoints for PaliGemma • 16 items • Updated 25 days ago • 124
Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models Paper • 2404.18796 • Published Apr 29 • 67
STT: Stateful Tracking with Transformers for Autonomous Driving Paper • 2405.00236 • Published Apr 30 • 7
Is Bigger Edit Batch Size Always Better? -- An Empirical Study on Model Editing with Llama-3 Paper • 2405.00664 • Published May 1 • 18
Self-Play Preference Optimization for Language Model Alignment Paper • 2405.00675 • Published May 1 • 21
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report Paper • 2405.00732 • Published Apr 29 • 116
Canary Collection A collection of multilingual and multitask speech to text models from NVIDIA NeMo 🐤 • 1 item • Updated 5 days ago • 16
Awesome Document AI Collection A collection of open-source document AI 📄 📝 📈 • 27 items • Updated Mar 11 • 43