Randomized Positional Encodings Boost Length Generalization of Transformers Paper • 2305.16843 • Published May 26, 2023 • 2
Mindstorms in Natural Language-Based Societies of Mind Paper • 2305.17066 • Published May 26, 2023 • 3
MrT5: Dynamic Token Merging for Efficient Byte-level Language Models Paper • 2410.20771 • Published Oct 28, 2024 • 3
A Modern Self-Referential Weight Matrix That Learns to Modify Itself Paper • 2202.05780 • Published Feb 11, 2022
SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention Paper • 2312.07987 • Published Dec 13, 2023 • 41
Approximating Two-Layer Feedforward Networks for Efficient Transformers Paper • 2310.10837 • Published Oct 16, 2023 • 11