-
Memory Augmented Language Models through Mixture of Word Experts
Paper ā¢ 2311.10768 ā¢ Published ā¢ 16 -
Mixtral of Experts
Paper ā¢ 2401.04088 ā¢ Published ā¢ 154 -
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
Paper ā¢ 2401.04081 ā¢ Published ā¢ 68 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper ā¢ 2401.06066 ā¢ Published ā¢ 36
Collections
Discover the best community collections!
Collections including paper arxiv:2401.04088
-
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
Paper ā¢ 2211.05100 ā¢ Published ā¢ 25 -
CsFEVER and CTKFacts: Acquiring Czech data for fact verification
Paper ā¢ 2201.11115 ā¢ Published -
Training language models to follow instructions with human feedback
Paper ā¢ 2203.02155 ā¢ Published ā¢ 12 -
FinGPT: Large Generative Models for a Small Language
Paper ā¢ 2311.05640 ā¢ Published ā¢ 26
-
Attention Is All You Need
Paper ā¢ 1706.03762 ā¢ Published ā¢ 39 -
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
Paper ā¢ 2307.08691 ā¢ Published ā¢ 6 -
Mixtral of Experts
Paper ā¢ 2401.04088 ā¢ Published ā¢ 154 -
Mistral 7B
Paper ā¢ 2310.06825 ā¢ Published ā¢ 45
-
Llemma: An Open Language Model For Mathematics
Paper ā¢ 2310.10631 ā¢ Published ā¢ 46 -
Mistral 7B
Paper ā¢ 2310.06825 ā¢ Published ā¢ 45 -
Qwen Technical Report
Paper ā¢ 2309.16609 ā¢ Published ā¢ 30 -
BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model
Paper ā¢ 2309.11568 ā¢ Published ā¢ 9
-
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models
Paper ā¢ 2310.16795 ā¢ Published ā¢ 26 -
Ensemble-Instruct: Generating Instruction-Tuning Data with a Heterogeneous Mixture of LMs
Paper ā¢ 2310.13961 ā¢ Published ā¢ 4 -
The Consensus Game: Language Model Generation via Equilibrium Search
Paper ā¢ 2310.09139 ā¢ Published ā¢ 12 -
Large Language Model Cascades with Mixture of Thoughts Representations for Cost-efficient Reasoning
Paper ā¢ 2310.03094 ā¢ Published ā¢ 12
-
FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation
Paper ā¢ 2310.03214 ā¢ Published ā¢ 14 -
HeaP: Hierarchical Policies for Web Actions using LLMs
Paper ā¢ 2310.03720 ā¢ Published ā¢ 5 -
Large Language Models Cannot Self-Correct Reasoning Yet
Paper ā¢ 2310.01798 ā¢ Published ā¢ 31 -
Mixtral of Experts
Paper ā¢ 2401.04088 ā¢ Published ā¢ 154
-
SIMPL: A Simple and Efficient Multi-agent Motion Prediction Baseline for Autonomous Driving
Paper ā¢ 2402.02519 ā¢ Published -
Mixtral of Experts
Paper ā¢ 2401.04088 ā¢ Published ā¢ 154 -
Optimal Transport Aggregation for Visual Place Recognition
Paper ā¢ 2311.15937 ā¢ Published -
GOAT: GO to Any Thing
Paper ā¢ 2311.06430 ā¢ Published ā¢ 14
-
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning
Paper ā¢ 2310.09478 ā¢ Published ā¢ 17 -
Can GPT models be Financial Analysts? An Evaluation of ChatGPT and GPT-4 on mock CFA Exams
Paper ā¢ 2310.08678 ā¢ Published ā¢ 11 -
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper ā¢ 2307.09288 ā¢ Published ā¢ 237 -
LLaMA: Open and Efficient Foundation Language Models
Paper ā¢ 2302.13971 ā¢ Published ā¢ 11
-
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper ā¢ 2402.17764 ā¢ Published ā¢ 575 -
Mixtral of Experts
Paper ā¢ 2401.04088 ā¢ Published ā¢ 154 -
Mistral 7B
Paper ā¢ 2310.06825 ā¢ Published ā¢ 45 -
Don't Make Your LLM an Evaluation Benchmark Cheater
Paper ā¢ 2311.01964 ā¢ Published ā¢ 1
-
Chain-of-Verification Reduces Hallucination in Large Language Models
Paper ā¢ 2309.11495 ā¢ Published ā¢ 37 -
Adapting Large Language Models via Reading Comprehension
Paper ā¢ 2309.09530 ā¢ Published ā¢ 73 -
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages
Paper ā¢ 2309.09400 ā¢ Published ā¢ 77 -
Language Modeling Is Compression
Paper ā¢ 2309.10668 ā¢ Published ā¢ 81