Collections
Discover the best community collections!
Collections including paper arxiv:2404.16811
-
Rho-1: Not All Tokens Are What You Need
Paper • 2404.07965 • Published • 87 -
VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time
Paper • 2404.10667 • Published • 18 -
Instruction-tuned Language Models are Better Knowledge Learners
Paper • 2402.12847 • Published • 25 -
DoRA: Weight-Decomposed Low-Rank Adaptation
Paper • 2402.09353 • Published • 26
-
Is Cosine-Similarity of Embeddings Really About Similarity?
Paper • 2403.05440 • Published • 3 -
GISTEmbed: Guided In-sample Selection of Training Negatives for Text Embedding Fine-tuning
Paper • 2402.16829 • Published -
Make Your LLM Fully Utilize the Context
Paper • 2404.16811 • Published • 53 -
KAN: Kolmogorov-Arnold Networks
Paper • 2404.19756 • Published • 108
-
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper • 2403.09611 • Published • 125 -
Evolutionary Optimization of Model Merging Recipes
Paper • 2403.13187 • Published • 50 -
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model
Paper • 2402.03766 • Published • 12 -
LLM Agent Operating System
Paper • 2403.16971 • Published • 65
-
Simple and Scalable Strategies to Continually Pre-train Large Language Models
Paper • 2403.08763 • Published • 49 -
Jamba: A Hybrid Transformer-Mamba Language Model
Paper • 2403.19887 • Published • 104 -
Transformer-Lite: High-efficiency Deployment of Large Language Models on Mobile Phone GPUs
Paper • 2403.20041 • Published • 34 -
Advancing LLM Reasoning Generalists with Preference Trees
Paper • 2404.02078 • Published • 44
-
Training-Free Long-Context Scaling of Large Language Models
Paper • 2402.17463 • Published • 19 -
Evaluating Very Long-Term Conversational Memory of LLM Agents
Paper • 2402.17753 • Published • 18 -
Resonance RoPE: Improving Context Length Generalization of Large Language Models
Paper • 2403.00071 • Published • 22 -
BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences
Paper • 2403.09347 • Published • 20
-
InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Memory
Paper • 2402.04617 • Published • 4 -
BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences
Paper • 2403.09347 • Published • 20 -
Resonance RoPE: Improving Context Length Generalization of Large Language Models
Paper • 2403.00071 • Published • 22 -
Training-Free Long-Context Scaling of Large Language Models
Paper • 2402.17463 • Published • 19
-
Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon
Paper • 2401.03462 • Published • 27 -
MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers
Paper • 2305.07185 • Published • 9 -
YaRN: Efficient Context Window Extension of Large Language Models
Paper • 2309.00071 • Published • 65 -
Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache
Paper • 2401.02669 • Published • 14
-
VideoBooth: Diffusion-based Video Generation with Image Prompts
Paper • 2312.00777 • Published • 21 -
MotionCtrl: A Unified and Flexible Motion Controller for Video Generation
Paper • 2312.03641 • Published • 20 -
GenTron: Delving Deep into Diffusion Transformers for Image and Video Generation
Paper • 2312.04557 • Published • 12 -
DreamVideo: Composing Your Dream Videos with Customized Subject and Motion
Paper • 2312.04433 • Published • 9
-
Statistical Rejection Sampling Improves Preference Optimization
Paper • 2309.06657 • Published • 13 -
In-Context Learning Creates Task Vectors
Paper • 2310.15916 • Published • 42 -
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length
Paper • 2404.08801 • Published • 64 -
Make Your LLM Fully Utilize the Context
Paper • 2404.16811 • Published • 53