-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 135 -
ReFT: Reasoning with Reinforced Fine-Tuning
Paper • 2401.08967 • Published • 26 -
Tuning Language Models by Proxy
Paper • 2401.08565 • Published • 19 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 62
Collections
Discover the best community collections!
Collections including paper arxiv:2401.02954
-
Adapting Large Language Models via Reading Comprehension
Paper • 2309.09530 • Published • 69 -
Gemma: Open Models Based on Gemini Research and Technology
Paper • 2403.08295 • Published • 41 -
Simple and Scalable Strategies to Continually Pre-train Large Language Models
Paper • 2403.08763 • Published • 48 -
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper • 2401.02954 • Published • 38
-
Yi: Open Foundation Models by 01.AI
Paper • 2403.04652 • Published • 57 -
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper • 2401.02954 • Published • 38 -
Qwen Technical Report
Paper • 2309.16609 • Published • 30 -
Gemma: Open Models Based on Gemini Research and Technology
Paper • 2403.08295 • Published • 41
-
Rethinking Optimization and Architecture for Tiny Language Models
Paper • 2402.02791 • Published • 12 -
Specialized Language Models with Cheap Inference from Limited Domain Data
Paper • 2402.01093 • Published • 45 -
Scavenging Hyena: Distilling Transformers into Long Convolution Models
Paper • 2401.17574 • Published • 14 -
Understanding LLMs: A Comprehensive Overview from Training to Inference
Paper • 2401.02038 • Published • 59
-
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper • 2401.02954 • Published • 38 -
Qwen Technical Report
Paper • 2309.16609 • Published • 30 -
GPT-4 Technical Report
Paper • 2303.08774 • Published • 3 -
Gemini: A Family of Highly Capable Multimodal Models
Paper • 2312.11805 • Published • 44
-
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 172 -
Learning Vision from Models Rivals Learning Vision from Data
Paper • 2312.17742 • Published • 12 -
PanGu-π: Enhancing Language Model Architectures via Nonlinearity Compensation
Paper • 2312.17276 • Published • 14 -
Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache
Paper • 2401.02669 • Published • 10
-
Attention Is All You Need
Paper • 1706.03762 • Published • 34 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 11 -
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Paper • 1907.11692 • Published • 7 -
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Paper • 1910.01108 • Published • 9
-
LoRAShear: Efficient Large Language Model Structured Pruning and Knowledge Recovery
Paper • 2310.18356 • Published • 22 -
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
Paper • 2401.01325 • Published • 24 -
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper • 2401.02954 • Published • 38