INT-FP-QSim: Mixed Precision and Formats For Large Language Models and Vision Transformers Paper • 2307.03712 • Published Jul 7, 2023 • 1
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters Paper • 2408.04093 • Published Aug 7 • 4
Arcee's MergeKit: A Toolkit for Merging Large Language Models Paper • 2403.13257 • Published Mar 20 • 20
LongVILA: Scaling Long-Context Visual Language Models for Long Videos Paper • 2408.10188 • Published Aug 19 • 51
m2mKD: Module-to-Module Knowledge Distillation for Modular Transformers Paper • 2402.16918 • Published Feb 26