-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 20 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 74 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 135 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 22
Collections
Discover the best community collections!
Collections including paper arxiv:2402.09668
-
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
Paper • 2309.12307 • Published • 82 -
NEFTune: Noisy Embeddings Improve Instruction Finetuning
Paper • 2310.05914 • Published • 13 -
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling
Paper • 2312.15166 • Published • 55 -
Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon
Paper • 2401.03462 • Published • 25
-
Pre-training Small Base LMs with Fewer Tokens
Paper • 2404.08634 • Published • 32 -
Ziya2: Data-centric Learning is All LLMs Need
Paper • 2311.03301 • Published • 16 -
How to Train Data-Efficient LLMs
Paper • 2402.09668 • Published • 33 -
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies
Paper • 2404.06395 • Published • 18
-
Adapting Large Language Models via Reading Comprehension
Paper • 2309.09530 • Published • 69 -
TinyGSM: achieving >80% on GSM8k with small language models
Paper • 2312.09241 • Published • 33 -
How to Train Data-Efficient LLMs
Paper • 2402.09668 • Published • 33 -
Let GPT be a Math Tutor: Teaching Math Word Problem Solvers with Customized Exercise Generation
Paper • 2305.14386 • Published
-
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework
Paper • 2404.14619 • Published • 118 -
Scaling Laws for Downstream Task Performance of Large Language Models
Paper • 2402.04177 • Published • 16 -
Orca 2: Teaching Small Language Models How to Reason
Paper • 2311.11045 • Published • 68 -
Orca-Math: Unlocking the potential of SLMs in Grade School Math
Paper • 2402.14830 • Published • 23
-
Watermarking Makes Language Models Radioactive
Paper • 2402.14904 • Published • 21 -
ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition
Paper • 2402.15220 • Published • 18 -
GPTVQ: The Blessing of Dimensionality for LLM Quantization
Paper • 2402.15319 • Published • 19 -
DiLightNet: Fine-grained Lighting Control for Diffusion-based Image Generation
Paper • 2402.11929 • Published • 9