RachidAR
's Collections
Papers
updated
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper
•
2402.17764
•
Published
•
605
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper
•
2403.03507
•
Published
•
183
Griffin: Mixing Gated Linear Recurrences with Local Attention for
Efficient Language Models
Paper
•
2402.19427
•
Published
•
52
ResLoRA: Identity Residual Mapping in Low-Rank Adaption
Paper
•
2402.18039
•
Published
•
11
Beyond Language Models: Byte Models are Digital World Simulators
Paper
•
2402.19155
•
Published
•
49
ShortGPT: Layers in Large Language Models are More Redundant Than You
Expect
Paper
•
2403.03853
•
Published
•
61
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Paper
•
2402.19479
•
Published
•
32
DoRA: Weight-Decomposed Low-Rank Adaptation
Paper
•
2402.09353
•
Published
•
26
Training Neural Networks from Scratch with Parallel Low-Rank Adapters
Paper
•
2402.16828
•
Published
•
3
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper
•
2312.00752
•
Published
•
138
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper
•
2403.09611
•
Published
•
125
Simple and Scalable Strategies to Continually Pre-train Large Language
Models
Paper
•
2403.08763
•
Published
•
49
InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding
Extremely Long Sequences with Training-Free Memory
Paper
•
2402.04617
•
Published
•
4
Rho-1: Not All Tokens Are What You Need
Paper
•
2404.07965
•
Published
•
87
Learn Your Reference Model for Real Good Alignment
Paper
•
2404.09656
•
Published
•
82
Megalodon: Efficient LLM Pretraining and Inference with Unlimited
Context Length
Paper
•
2404.08801
•
Published
•
64
Mixture-of-LoRAs: An Efficient Multitask Tuning for Large Language
Models
Paper
•
2403.03432
•
Published
•
1
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your
Phone
Paper
•
2404.14219
•
Published
•
253