-
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM
Paper • 2403.07816 • Published • 37 -
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
Paper • 2402.01739 • Published • 26 -
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
Paper • 2401.15947 • Published • 46 -
Mixture-of-LoRAs: An Efficient Multitask Tuning for Large Language Models
Paper • 2403.03432 • Published
Collections
Discover the best community collections!
Collections including paper arxiv:2402.01739
-
Turn Waste into Worth: Rectifying Top-k Router of MoE
Paper • 2402.12399 • Published • 2 -
CompeteSMoE -- Effective Training of Sparse Mixture of Experts via Competition
Paper • 2402.02526 • Published • 3 -
Buffer Overflow in Mixture of Experts
Paper • 2402.05526 • Published • 8 -
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
Paper • 2402.01739 • Published • 26
-
Non-asymptotic oracle inequalities for the Lasso in high-dimensional mixture of experts
Paper • 2009.10622 • Published • 1 -
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
Paper • 2401.15947 • Published • 46 -
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
Paper • 2401.04081 • Published • 68 -
MoE-Infinity: Activation-Aware Expert Offloading for Efficient MoE Serving
Paper • 2401.14361 • Published • 2
-
Mixtral of Experts
Paper • 2401.04088 • Published • 152 -
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
Paper • 2401.15947 • Published • 46 -
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
Paper • 2401.04081 • Published • 68 -
EdgeMoE: Fast On-Device Inference of MoE-based Large Language Models
Paper • 2308.14352 • Published
-
LoRA+: Efficient Low Rank Adaptation of Large Models
Paper • 2402.12354 • Published • 5 -
The FinBen: An Holistic Financial Benchmark for Large Language Models
Paper • 2402.12659 • Published • 13 -
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization
Paper • 2402.13249 • Published • 10 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 62
-
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
Paper • 2402.01739 • Published • 26 -
Rethinking Interpretability in the Era of Large Language Models
Paper • 2402.01761 • Published • 18 -
Self-Discover: Large Language Models Self-Compose Reasoning Structures
Paper • 2402.03620 • Published • 102 -
Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model
Paper • 2402.07827 • Published • 43
-
BlackMamba: Mixture of Experts for State-Space Models
Paper • 2402.01771 • Published • 22 -
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
Paper • 2402.01739 • Published • 26 -
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
Paper • 2401.15947 • Published • 46 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 35
-
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
Paper • 2402.04291 • Published • 48 -
Self-Discover: Large Language Models Self-Compose Reasoning Structures
Paper • 2402.03620 • Published • 102 -
Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks
Paper • 2402.04248 • Published • 25 -
Scaling Laws for Downstream Task Performance of Large Language Models
Paper • 2402.04177 • Published • 16
-
Rethinking Interpretability in the Era of Large Language Models
Paper • 2402.01761 • Published • 18 -
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
Paper • 2402.01739 • Published • 26 -
BitNet: Scaling 1-bit Transformers for Large Language Models
Paper • 2310.11453 • Published • 94 -
Stealing Part of a Production Language Model
Paper • 2403.06634 • Published • 85
-
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
Paper • 2402.01739 • Published • 26 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 61 -
Rethinking Interpretability in the Era of Large Language Models
Paper • 2402.01761 • Published • 18