AgentGym: Evolving Large Language Model-based Agents across Diverse Environments Paper • 2406.04151 • Published Jun 6, 2024 • 22
Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models Paper • 2403.12881 • Published Mar 19, 2024 • 18
AgentTuning: Enabling Generalized Agent Abilities for LLMs Paper • 2310.12823 • Published Oct 19, 2023 • 36
Redefining Temporal Modeling in Video Diffusion: The Vectorized Timestep Approach Paper • 2410.03160 • Published Oct 4, 2024 • 5
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling Paper • 2412.05271 • Published Dec 6, 2024 • 157
ReLU^2 Wins: Discovering Efficient Activation Functions for Sparse LLMs Paper • 2402.03804 • Published Feb 6, 2024 • 3
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU Paper • 2312.12456 • Published Dec 16, 2023 • 44
ProSparse: Introducing and Enhancing Intrinsic Activation Sparsity within Large Language Models Paper • 2402.13516 • Published Feb 21, 2024 • 1
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model Paper • 2405.04434 • Published May 7, 2024 • 21
Neural Machine Translation by Jointly Learning to Align and Translate Paper • 1409.0473 • Published Sep 1, 2014 • 6
Effective Approaches to Attention-based Neural Machine Translation Paper • 1508.04025 • Published Aug 17, 2015 • 3
XAttention: Block Sparse Attention with Antidiagonal Scoring Paper • 2503.16428 • Published Mar 20 • 14
Kolmogorov-Arnold Attention: Is Learnable Attention Better For Vision Transformers? Paper • 2503.10632 • Published Mar 13 • 14
Slim attention: cut your context memory in half without loss of accuracy -- K-cache is all you need for MHA Paper • 2503.05840 • Published Mar 7 • 3
view article Article How to Reduce Memory Use in Reasoning Models By Kseniase and 1 other • Mar 13 • 14
view article Article Topic 33: Slim Attention, KArAt, XAttention and Multi-Token Attention Explained – What’s Really Changing in Transformers? By Kseniase and 1 other • Apr 4 • 14
MIGE: A Unified Framework for Multimodal Instruction-Based Image Generation and Editing Paper • 2502.21291 • Published Feb 28 • 5