Collections
Discover the best community collections!
Collections including paper arxiv:2403.17297
-
InternLM2 Technical Report
Paper • 2403.17297 • Published • 25 -
sDPO: Don't Use Your Data All at Once
Paper • 2403.19270 • Published • 31 -
Learn Your Reference Model for Real Good Alignment
Paper • 2404.09656 • Published • 79 -
OpenBezoar: Small, Cost-Effective and Open Models Trained on Mixes of Instruction Data
Paper • 2404.12195 • Published • 11
-
Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning
Paper • 2310.20587 • Published • 15 -
SELF: Language-Driven Self-Evolution for Large Language Model
Paper • 2310.00533 • Published • 2 -
QLoRA: Efficient Finetuning of Quantized LLMs
Paper • 2305.14314 • Published • 39 -
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
Paper • 2309.14717 • Published • 43
-
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper • 2403.03507 • Published • 172 -
RAFT: Adapting Language Model to Domain Specific RAG
Paper • 2403.10131 • Published • 58 -
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models
Paper • 2403.13372 • Published • 50 -
InternLM2 Technical Report
Paper • 2403.17297 • Published • 25
-
Measuring the Effects of Data Parallelism on Neural Network Training
Paper • 1811.03600 • Published • 2 -
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost
Paper • 1804.04235 • Published • 2 -
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
Paper • 1905.11946 • Published • 2 -
Yi: Open Foundation Models by 01.AI
Paper • 2403.04652 • Published • 57
-
Beyond Language Models: Byte Models are Digital World Simulators
Paper • 2402.19155 • Published • 44 -
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper • 2402.19427 • Published • 48 -
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks
Paper • 2403.00522 • Published • 40 -
Resonance RoPE: Improving Context Length Generalization of Large Language Models
Paper • 2403.00071 • Published • 18
-
Nemotron-4 15B Technical Report
Paper • 2402.16819 • Published • 40 -
InternLM2 Technical Report
Paper • 2403.17297 • Published • 25 -
Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model
Paper • 2404.04167 • Published • 8 -
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases
Paper • 2402.14905 • Published • 79
-
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
Paper • 2402.13753 • Published • 104 -
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking
Paper • 2403.09629 • Published • 54 -
Larimar: Large Language Models with Episodic Memory Control
Paper • 2403.11901 • Published • 30 -
Evolutionary Optimization of Model Merging Recipes
Paper • 2403.13187 • Published • 44