Collections
Discover the best community collections!
Collections including paper arxiv:2402.09668
-
Pre-training Small Base LMs with Fewer Tokens
Paper • 2404.08634 • Published • 32 -
Ziya2: Data-centric Learning is All LLMs Need
Paper • 2311.03301 • Published • 16 -
How to Train Data-Efficient LLMs
Paper • 2402.09668 • Published • 34 -
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies
Paper • 2404.06395 • Published • 18
-
Adapting Large Language Models via Reading Comprehension
Paper • 2309.09530 • Published • 69 -
TinyGSM: achieving >80% on GSM8k with small language models
Paper • 2312.09241 • Published • 33 -
How to Train Data-Efficient LLMs
Paper • 2402.09668 • Published • 34 -
Let GPT be a Math Tutor: Teaching Math Word Problem Solvers with Customized Exercise Generation
Paper • 2305.14386 • Published
-
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework
Paper • 2404.14619 • Published • 120 -
Scaling Laws for Downstream Task Performance of Large Language Models
Paper • 2402.04177 • Published • 16 -
Orca 2: Teaching Small Language Models How to Reason
Paper • 2311.11045 • Published • 69 -
Orca-Math: Unlocking the potential of SLMs in Grade School Math
Paper • 2402.14830 • Published • 23
-
Watermarking Makes Language Models Radioactive
Paper • 2402.14904 • Published • 21 -
ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition
Paper • 2402.15220 • Published • 18 -
GPTVQ: The Blessing of Dimensionality for LLM Quantization
Paper • 2402.15319 • Published • 19 -
DiLightNet: Fine-grained Lighting Control for Diffusion-based Image Generation
Paper • 2402.11929 • Published • 9
-
How to Train Data-Efficient LLMs
Paper • 2402.09668 • Published • 34 -
Adapting Large Language Models via Reading Comprehension
Paper • 2309.09530 • Published • 69 -
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper • 2403.03507 • Published • 175 -
MathScale: Scaling Instruction Tuning for Mathematical Reasoning
Paper • 2403.02884 • Published • 14
-
Effective pruning of web-scale datasets based on complexity of concept clusters
Paper • 2401.04578 • Published -
How to Train Data-Efficient LLMs
Paper • 2402.09668 • Published • 34 -
A Survey on Data Selection for LLM Instruction Tuning
Paper • 2402.05123 • Published • 3 -
LESS: Selecting Influential Data for Targeted Instruction Tuning
Paper • 2402.04333 • Published • 3