Collections
Discover the best community collections!
Collections including paper arxiv:2305.13245
-
Lumiere: A Space-Time Diffusion Model for Video Generation
Paper • 2401.12945 • Published • 82 -
Long-form factuality in large language models
Paper • 2403.18802 • Published • 23 -
ObjectDrop: Bootstrapping Counterfactuals for Photorealistic Object Removal and Insertion
Paper • 2403.18818 • Published • 22 -
TC4D: Trajectory-Conditioned Text-to-4D Generation
Paper • 2403.17920 • Published • 15
-
Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping
Paper • 2402.14083 • Published • 43 -
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
Paper • 2305.13245 • Published • 5 -
Training a T5 Using Lab-sized Resources
Paper • 2208.12097 • Published • 1 -
Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints
Paper • 2212.05055 • Published • 4
-
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
Paper • 2305.13245 • Published • 5 -
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models
Paper • 2404.12387 • Published • 35 -
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework
Paper • 2404.14619 • Published • 120
-
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper • 2403.09611 • Published • 119 -
Evolutionary Optimization of Model Merging Recipes
Paper • 2403.13187 • Published • 45 -
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model
Paper • 2402.03766 • Published • 9 -
LLM Agent Operating System
Paper • 2403.16971 • Published • 62
-
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost
Paper • 1804.04235 • Published • 2 -
DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining
Paper • 2305.10429 • Published • 2 -
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
Paper • 2305.13245 • Published • 5
-
Measuring the Effects of Data Parallelism on Neural Network Training
Paper • 1811.03600 • Published • 2 -
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost
Paper • 1804.04235 • Published • 2 -
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
Paper • 1905.11946 • Published • 2 -
Yi: Open Foundation Models by 01.AI
Paper • 2403.04652 • Published • 59
-
Lossless Acceleration for Seq2seq Generation with Aggressive Decoding
Paper • 2205.10350 • Published • 2 -
Blockwise Parallel Decoding for Deep Autoregressive Models
Paper • 1811.03115 • Published • 2 -
Fast Transformer Decoding: One Write-Head is All You Need
Paper • 1911.02150 • Published • 5 -
Sequence-Level Knowledge Distillation
Paper • 1606.07947 • Published • 2
-
Linear Transformers with Learnable Kernel Functions are Better In-Context Models
Paper • 2402.10644 • Published • 74 -
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
Paper • 2305.13245 • Published • 5 -
ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition
Paper • 2402.15220 • Published • 18 -
Sequence Parallelism: Long Sequence Training from System Perspective
Paper • 2105.13120 • Published • 5
-
Detecting Pretraining Data from Large Language Models
Paper • 2310.16789 • Published • 9 -
Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large Language Models by Extrapolating Errors from Small Models
Paper • 2310.13671 • Published • 17 -
AutoMix: Automatically Mixing Language Models
Paper • 2310.12963 • Published • 14 -
An Emulator for Fine-Tuning Large Language Models using Small Language Models
Paper • 2310.12962 • Published • 13