YAYI 2: Multilingual Open-Source Large Language Models Paper • 2312.14862 • Published Dec 22, 2023 • 11
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling Paper • 2312.15166 • Published Dec 23, 2023 • 55
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models Paper • 2401.06066 • Published Jan 11 • 35
Composable Function-preserving Expansions for Transformer Architectures Paper • 2308.06103 • Published Aug 11, 2023 • 18
Large Language Models are Superpositions of All Characters: Attaining Arbitrary Role-play via Self-Alignment Paper • 2401.12474 • Published Jan 23 • 33
BitNet: Scaling 1-bit Transformers for Large Language Models Paper • 2310.11453 • Published Oct 17, 2023 • 94
Specialized Language Models with Cheap Inference from Limited Domain Data Paper • 2402.01093 • Published Feb 2 • 45
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models Paper • 2402.03300 • Published Feb 5 • 61
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models Paper • 2402.01739 • Published Jan 29 • 26
Rethinking Optimization and Architecture for Tiny Language Models Paper • 2402.02791 • Published Feb 5 • 12
A Tale of Tails: Model Collapse as a Change of Scaling Laws Paper • 2402.07043 • Published Feb 10 • 12
Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model Paper • 2402.07827 • Published Feb 12 • 43
FinTral: A Family of GPT-4 Level Multimodal Financial Large Language Models Paper • 2402.10986 • Published Feb 16 • 73
TEQ: Trainable Equivalent Transformation for Quantization of LLMs Paper • 2310.10944 • Published Oct 17, 2023 • 9
DenseMamba: State Space Models with Dense Hidden Connection for Efficient Large Language Models Paper • 2403.00818 • Published Feb 26 • 13