-
Trellis Networks for Sequence Modeling
Paper ā¢ 1810.06682 ā¢ Published ā¢ 1 -
ProSG: Using Prompt Synthetic Gradients to Alleviate Prompt Forgetting of RNN-like Language Models
Paper ā¢ 2311.01981 ā¢ Published ā¢ 1 -
Gated recurrent neural networks discover attention
Paper ā¢ 2309.01775 ā¢ Published ā¢ 7 -
Inverse Approximation Theory for Nonlinear Recurrent Neural Networks
Paper ā¢ 2305.19190 ā¢ Published ā¢ 1
Collections
Discover the best community collections!
Collections including paper arxiv:2312.00752
-
UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs
Paper ā¢ 2311.09257 ā¢ Published ā¢ 45 -
Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference
Paper ā¢ 2310.04378 ā¢ Published ā¢ 19 -
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
Paper ā¢ 2309.14717 ā¢ Published ā¢ 44 -
Exponentially Faster Language Modelling
Paper ā¢ 2311.10770 ā¢ Published ā¢ 118
-
The Impact of Depth and Width on Transformer Language Model Generalization
Paper ā¢ 2310.19956 ā¢ Published ā¢ 9 -
Retentive Network: A Successor to Transformer for Large Language Models
Paper ā¢ 2307.08621 ā¢ Published ā¢ 170 -
RWKV: Reinventing RNNs for the Transformer Era
Paper ā¢ 2305.13048 ā¢ Published ā¢ 14 -
Attention Is All You Need
Paper ā¢ 1706.03762 ā¢ Published ā¢ 44
-
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation
Paper ā¢ 2108.12409 ā¢ Published ā¢ 5 -
YaRN: Efficient Context Window Extension of Large Language Models
Paper ā¢ 2309.00071 ā¢ Published ā¢ 65 -
MIMIC-IT: Multi-Modal In-Context Instruction Tuning
Paper ā¢ 2306.05425 ā¢ Published ā¢ 11 -
Music ControlNet: Multiple Time-varying Controls for Music Generation
Paper ā¢ 2311.07069 ā¢ Published ā¢ 43
-
Attention Is All You Need
Paper ā¢ 1706.03762 ā¢ Published ā¢ 44 -
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
Paper ā¢ 2307.08691 ā¢ Published ā¢ 8 -
Mixtral of Experts
Paper ā¢ 2401.04088 ā¢ Published ā¢ 157 -
Mistral 7B
Paper ā¢ 2310.06825 ā¢ Published ā¢ 47
-
Llemma: An Open Language Model For Mathematics
Paper ā¢ 2310.10631 ā¢ Published ā¢ 50 -
Mistral 7B
Paper ā¢ 2310.06825 ā¢ Published ā¢ 47 -
Qwen Technical Report
Paper ā¢ 2309.16609 ā¢ Published ā¢ 34 -
BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model
Paper ā¢ 2309.11568 ā¢ Published ā¢ 10
-
When can transformers reason with abstract symbols?
Paper ā¢ 2310.09753 ā¢ Published ā¢ 2 -
In-Context Pretraining: Language Modeling Beyond Document Boundaries
Paper ā¢ 2310.10638 ā¢ Published ā¢ 28 -
Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model
Paper ā¢ 2310.09520 ā¢ Published ā¢ 10 -
Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers
Paper ā¢ 2309.08532 ā¢ Published ā¢ 52
-
Clinical Text Summarization: Adapting Large Language Models Can Outperform Human Experts
Paper ā¢ 2309.07430 ā¢ Published ā¢ 27 -
MindAgent: Emergent Gaming Interaction
Paper ā¢ 2309.09971 ā¢ Published ā¢ 11 -
Cure the headache of Transformers via Collinear Constrained Attention
Paper ā¢ 2309.08646 ā¢ Published ā¢ 12 -
Contrastive Decoding Improves Reasoning in Large Language Models
Paper ā¢ 2309.09117 ā¢ Published ā¢ 37
-
Uncovering mesa-optimization algorithms in Transformers
Paper ā¢ 2309.05858 ā¢ Published ā¢ 12 -
ProPainter: Improving Propagation and Transformer for Video Inpainting
Paper ā¢ 2309.03897 ā¢ Published ā¢ 26 -
Approximating Two-Layer Feedforward Networks for Efficient Transformers
Paper ā¢ 2310.10837 ā¢ Published ā¢ 10 -
CLEX: Continuous Length Extrapolation for Large Language Models
Paper ā¢ 2310.16450 ā¢ Published ā¢ 9
-
Large Language Models as Optimizers
Paper ā¢ 2309.03409 ā¢ Published ā¢ 75 -
From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting
Paper ā¢ 2309.04269 ā¢ Published ā¢ 32 -
Textbooks Are All You Need II: phi-1.5 technical report
Paper ā¢ 2309.05463 ā¢ Published ā¢ 87 -
Efficient Memory Management for Large Language Model Serving with PagedAttention
Paper ā¢ 2309.06180 ā¢ Published ā¢ 25