-
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length
Paper • 2404.08801 • Published • 62 -
Ring Attention with Blockwise Transformers for Near-Infinite Context
Paper • 2310.01889 • Published • 8 -
World Model on Million-Length Video And Language With RingAttention
Paper • 2402.08268 • Published • 33 -
Scaling Transformer to 1M tokens and beyond with RMT
Paper • 2304.11062 • Published • 2
Collections
Discover the best community collections!
Collections including paper arxiv:2404.02060
-
Compression Represents Intelligence Linearly
Paper • 2404.09937 • Published • 27 -
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies
Paper • 2404.06395 • Published • 18 -
Long-context LLMs Struggle with Long In-context Learning
Paper • 2404.02060 • Published • 33 -
Are large language models superhuman chemists?
Paper • 2404.01475 • Published • 13
-
Long-context LLMs Struggle with Long In-context Learning
Paper • 2404.02060 • Published • 33 -
Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks
Paper • 2211.12588 • Published • 3 -
StructLM: Towards Building Generalist Models for Structured Knowledge Grounding
Paper • 2402.16671 • Published • 26 -
Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model
Paper • 2404.04167 • Published • 8
-
Long-context LLMs Struggle with Long In-context Learning
Paper • 2404.02060 • Published • 33 -
Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks
Paper • 2211.12588 • Published • 3 -
StructLM: Towards Building Generalist Models for Structured Knowledge Grounding
Paper • 2402.16671 • Published • 26 -
Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model
Paper • 2404.04167 • Published • 8
-
The FinBen: An Holistic Financial Benchmark for Large Language Models
Paper • 2402.12659 • Published • 13 -
Long-context LLMs Struggle with Long In-context Learning
Paper • 2404.02060 • Published • 33 -
Challenge LLMs to Reason About Reasoning: A Benchmark to Unveil Cognitive Depth in LLMs
Paper • 2312.17080 • Published • 1 -
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Paper • 1804.07461 • Published • 4