-
LoRA+: Efficient Low Rank Adaptation of Large Models
Paper • 2402.12354 • Published • 5 -
The FinBen: An Holistic Financial Benchmark for Large Language Models
Paper • 2402.12659 • Published • 13 -
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization
Paper • 2402.13249 • Published • 10 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 62
Collections
Discover the best community collections!
Collections including paper arxiv:2403.07816
-
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM
Paper • 2403.07816 • Published • 37 -
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
Paper • 2402.01739 • Published • 26 -
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
Paper • 2401.15947 • Published • 46 -
Mixture-of-LoRAs: An Efficient Multitask Tuning for Large Language Models
Paper • 2403.03432 • Published
-
Gemma: Open Models Based on Gemini Research and Technology
Paper • 2403.08295 • Published • 43 -
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models
Paper • 2403.13372 • Published • 57 -
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM
Paper • 2403.07816 • Published • 37 -
MEGAVERSE: Benchmarking Large Language Models Across Languages, Modalities, Models and Tasks
Paper • 2311.07463 • Published • 13
-
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM
Paper • 2403.07816 • Published • 37 -
microsoft/phi-1_5
Text Generation • Updated • 132k • 1.28k -
Language models scale reliably with over-training and on downstream tasks
Paper • 2403.08540 • Published • 13 -
Akashpb13/Swahili_xlsr
Automatic Speech Recognition • Updated • 889 • 7
-
Scaling Instruction-Finetuned Language Models
Paper • 2210.11416 • Published • 5 -
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper • 2312.00752 • Published • 131 -
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Paper • 2403.05530 • Published • 50 -
Yi: Open Foundation Models by 01.AI
Paper • 2403.04652 • Published • 59
-
Mixture-of-Supernets: Improving Weight-Sharing Supernet Training with Architecture-Routed Mixture-of-Experts
Paper • 2306.04845 • Published • 3 -
Patch-level Routing in Mixture-of-Experts is Provably Sample-efficient for Convolutional Neural Networks
Paper • 2306.04073 • Published • 2 -
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM
Paper • 2403.07816 • Published • 37 -
Unified Scaling Laws for Routed Language Models
Paper • 2202.01169 • Published • 2