-
Order Matters in the Presence of Dataset Imbalance for Multilingual Learning
Paper • 2312.06134 • Published • 2 -
Efficient Monotonic Multihead Attention
Paper • 2312.04515 • Published • 6 -
Contrastive Decoding Improves Reasoning in Large Language Models
Paper • 2309.09117 • Published • 37 -
Exploring Format Consistency for Instruction Tuning
Paper • 2307.15504 • Published • 7
Collections
Discover the best community collections!
Collections including paper arxiv:2310.15916
-
MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs
Paper • 2402.15627 • Published • 34 -
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models
Paper • 2402.17177 • Published • 88 -
Beyond Language Models: Byte Models are Digital World Simulators
Paper • 2402.19155 • Published • 49 -
Hydragen: High-Throughput LLM Inference with Shared Prefixes
Paper • 2402.05099 • Published • 18
-
A Language Model's Guide Through Latent Space
Paper • 2402.14433 • Published • 1 -
The Hidden Space of Transformer Language Adapters
Paper • 2402.13137 • Published -
Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models
Paper • 2402.16438 • Published -
AtP*: An efficient and scalable method for localizing LLM behaviour to components
Paper • 2403.00745 • Published • 11
-
In-Context Learning Creates Task Vectors
Paper • 2310.15916 • Published • 41 -
Point Transformer V3: Simpler, Faster, Stronger
Paper • 2312.10035 • Published • 17 -
Extending Context Window of Large Language Models via Semantic Compression
Paper • 2312.09571 • Published • 12 -
PanGu-π: Enhancing Language Model Architectures via Nonlinearity Compensation
Paper • 2312.17276 • Published • 15
-
In-Context Learning Creates Task Vectors
Paper • 2310.15916 • Published • 41 -
Vision-Language Models are Zero-Shot Reward Models for Reinforcement Learning
Paper • 2310.12921 • Published • 19 -
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models
Paper • 2310.08491 • Published • 53
-
In-Context Learning Creates Task Vectors
Paper • 2310.15916 • Published • 41 -
When can transformers reason with abstract symbols?
Paper • 2310.09753 • Published • 2 -
Improving Length-Generalization in Transformers via Task Hinting
Paper • 2310.00726 • Published • 1 -
In-context Autoencoder for Context Compression in a Large Language Model
Paper • 2307.06945 • Published • 27