Collections
Discover the best community collections!
Collections including paper arxiv:2403.16971
-
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper • 2403.09611 • Published • 119 -
Evolutionary Optimization of Model Merging Recipes
Paper • 2403.13187 • Published • 44 -
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model
Paper • 2402.03766 • Published • 9 -
LLM Agent Operating System
Paper • 2403.16971 • Published • 62
-
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM
Paper • 2403.07816 • Published • 37 -
microsoft/phi-1_5
Text Generation • Updated • 123k • 1.28k -
Language models scale reliably with over-training and on downstream tasks
Paper • 2403.08540 • Published • 13 -
Akashpb13/Swahili_xlsr
Automatic Speech Recognition • Updated • 503 • 7
-
SaulLM-7B: A pioneering Large Language Model for Law
Paper • 2403.03883 • Published • 66 -
Character-LLM: A Trainable Agent for Role-Playing
Paper • 2310.10158 • Published • 1 -
LLM Agent Operating System
Paper • 2403.16971 • Published • 62 -
RakutenAI-7B: Extending Large Language Models for Japanese
Paper • 2403.15484 • Published • 12
-
Measuring the Effects of Data Parallelism on Neural Network Training
Paper • 1811.03600 • Published • 2 -
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost
Paper • 1804.04235 • Published • 2 -
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
Paper • 1905.11946 • Published • 2 -
Yi: Open Foundation Models by 01.AI
Paper • 2403.04652 • Published • 59
-
Evaluating Very Long-Term Conversational Memory of LLM Agents
Paper • 2402.17753 • Published • 17 -
StructLM: Towards Building Generalist Models for Structured Knowledge Grounding
Paper • 2402.16671 • Published • 26 -
Do Large Language Models Latently Perform Multi-Hop Reasoning?
Paper • 2402.16837 • Published • 24 -
Divide-or-Conquer? Which Part Should You Distill Your LLM?
Paper • 2402.15000 • Published • 22