Recurrent Context Compression: Efficiently Expanding the Context Window of LLM Paper • 2406.06110 • Published 5 days ago • 2
SinkLoRA: Enhanced Efficiency and Chat Capabilities for Long-Context Large Language Models Paper • 2406.05678 • Published 6 days ago • 2
XL3M: A Training-free Framework for LLM Length Extension Based on Segment-wise Inference Paper • 2405.17755 • Published 18 days ago • 2
Length Generalization of Causal Transformers without Position Encoding Paper • 2404.12224 • Published Apr 18 • 1 • 2
LongVQ: Long Sequence Modeling with Vector Quantization on Structured Memory Paper • 2404.11163 • Published Apr 17 • 2
Universal In-Context Approximation By Prompting Fully Recurrent Models Paper • 2406.01424 • Published 11 days ago • 2
A Unified Implicit Attention Formulation for Gated-Linear Recurrent Sequence Models Paper • 2405.16504 • Published 20 days ago • 2
Short-Long Convolutions Help Hardware-Efficient Linear Attention to Focus on Long Sequences Paper • 2406.08128 • Published 3 days ago • 2
Understanding the differences in Foundation Models: Attention, State Space Models, and Recurrent Neural Networks Paper • 2405.15731 • Published 21 days ago • 2
State-Free Inference of State-Space Models: The Transfer Function Approach Paper • 2405.06147 • Published May 10 • 2
LoCoCo: Dropping In Convolutions for Long Context Compression Paper • 2406.05317 • Published 7 days ago • 2
Parallelizing Linear Transformers with the Delta Rule over Sequence Length Paper • 2406.06484 • Published 4 days ago • 2 • 2
LongSSM: On the Length Extension of State-space Models in Language Modelling Paper • 2406.02080 • Published 11 days ago • 2
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling Paper • 2406.07522 • Published 3 days ago • 14 • 3
Multilingual Large Language Models Are Not (Yet) Code-Switchers Paper • 2305.14235 • Published May 23, 2023 • 2
Do Llamas Work in English? On the Latent Language of Multilingual Transformers Paper • 2402.10588 • Published Feb 16 • 1 • 2
Integrating Multi-scale Contextualized Information for Byte-based Neural Machine Translation Paper • 2405.19290 • Published 16 days ago • 2
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks Paper • 1905.11946 • Published May 28, 2019 • 3 • 2
SpaceByte: Towards Deleting Tokenization from Large Language Modeling Paper • 2404.14408 • Published Apr 22 • 6 • 3
LoGAH: Predicting 774-Million-Parameter Transformers using Graph HyperNetworks with 1/100 Parameters Paper • 2405.16287 • Published 20 days ago • 10 • 2
D'OH: Decoder-Only random Hypernetworks for Implicit Neural Representations Paper • 2403.19163 • Published Mar 28 • 2
Byte-Level Recursive Convolutional Auto-Encoder for Text Paper • 1802.01817 • Published Feb 6, 2018 • 2
MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers Paper • 2305.07185 • Published May 12, 2023 • 8 • 9
Prompting-based Synthetic Data Generation for Few-Shot Question Answering Paper • 2405.09335 • Published about 1 month ago • 2
Enhancing Conversational Search: Large Language Model-Aided Informative Query Rewriting Paper • 2310.09716 • Published Oct 15, 2023 • 2
TarGEN: Targeted Data Generation with Large Language Models Paper • 2310.17876 • Published Oct 27, 2023 • 2
CrossTune: Black-Box Few-Shot Classification with Label Enhancement Paper • 2403.12468 • Published Mar 19 • 2
ReflectionCoder: Learning from Reflection Sequence for Enhanced One-off Code Generation Paper • 2405.17057 • Published 19 days ago • 2
SemCoder: Training Code Language Models with Comprehensive Semantics Paper • 2406.01006 • Published 12 days ago • 2
NExT: Teaching Large Language Models to Reason about Code Execution Paper • 2404.14662 • Published Apr 23 • 4 • 2
SLoPe: Double-Pruned Sparse Plus Lazy Low-Rank Adapter Pretraining of LLMs Paper • 2405.16325 • Published 20 days ago • 1 • 2
VB-LoRA: Extreme Parameter Efficient Fine-Tuning with Vector Banks Paper • 2405.15179 • Published 22 days ago • 1 • 2
ETHER: Efficient Finetuning of Large-Scale Models with Hyperplane Reflections Paper • 2405.20271 • Published 15 days ago • 2
SLTrain: a sparse plus low-rank approach for parameter and memory efficient pretraining Paper • 2406.02214 • Published 11 days ago • 2
SVFT: Parameter-Efficient Fine-Tuning with Singular Vectors Paper • 2405.19597 • Published 16 days ago • 2
LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters Paper • 2405.17604 • Published 18 days ago • 1 • 2
Parameter-Efficient Fine-Tuning with Discrete Fourier Transform Paper • 2405.03003 • Published May 5 • 1 • 2
NOLA: Networks as Linear Combination of Low Rank Random Basis Paper • 2310.02556 • Published Oct 4, 2023 • 2 • 2
MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model Paper • 2405.20222 • Published 15 days ago • 10 • 1
CamViG: Camera Aware Image-to-Video Generation with Multimodal Transformers Paper • 2405.13195 • Published 24 days ago • 8 • 1
CAMELoT: Towards Large Language Models with Training-Free Consolidated Associative Memory Paper • 2402.13449 • Published Feb 21 • 2
Self-Selected Attention Span for Accelerating Large Language Model Inference Paper • 2404.09336 • Published Apr 14 • 2