matlok
's Collections
Papers - Custom Layers
updated
Unleashing the Power of Pre-trained Language Models for Offline
Reinforcement Learning
Paper
•
2310.20587
•
Published
•
16
JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and
Attention
Paper
•
2310.00535
•
Published
•
2
Does Circuit Analysis Interpretability Scale? Evidence from Multiple
Choice Capabilities in Chinchilla
Paper
•
2307.09458
•
Published
•
10
The Impact of Depth and Width on Transformer Language Model
Generalization
Paper
•
2310.19956
•
Published
•
9
Veagle: Advancements in Multimodal Representation Learning
Paper
•
2403.08773
•
Published
•
7
Hash Layers For Large Sparse Models
Paper
•
2106.04426
•
Published
•
2
Rethinking Attention: Exploring Shallow Feed-Forward Neural Networks as
an Alternative to Attention Layers in Transformers
Paper
•
2311.10642
•
Published
•
23
DenseFormer: Enhancing Information Flow in Transformers via Depth
Weighted Averaging
Paper
•
2402.02622
•
Published
•
3
The Unreasonable Ineffectiveness of the Deeper Layers
Paper
•
2403.17887
•
Published
•
78
Lumiere: A Space-Time Diffusion Model for Video Generation
Paper
•
2401.12945
•
Published
•
86
RWKV: Reinventing RNNs for the Transformer Era
Paper
•
2305.13048
•
Published
•
15
Condition-Aware Neural Network for Controlled Image Generation
Paper
•
2404.01143
•
Published
•
11
Locating and Editing Factual Associations in GPT
Paper
•
2202.05262
•
Published
•
1
MLP Can Be A Good Transformer Learner
Paper
•
2404.05657
•
Published
•
1
Toward a Better Understanding of Fourier Neural Operators: Analysis and
Improvement from a Spectral Perspective
Paper
•
2404.07200
•
Published
•
1
MegaScale: Scaling Large Language Model Training to More Than 10,000
GPUs
Paper
•
2402.15627
•
Published
•
34
Scaling MLPs: A Tale of Inductive Bias
Paper
•
2306.13575
•
Published
•
14
GLIGEN: Open-Set Grounded Text-to-Image Generation
Paper
•
2301.07093
•
Published
•
3
All you need is a good init
Paper
•
1511.06422
•
Published
•
1
LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding
Paper
•
2404.16710
•
Published
•
75
Is Bigger Edit Batch Size Always Better? -- An Empirical Study on Model
Editing with Llama-3
Paper
•
2405.00664
•
Published
•
18
pyvene: A Library for Understanding and Improving PyTorch Models via
Interventions
Paper
•
2403.07809
•
Published
•
1
TokenFormer: Rethinking Transformer Scaling with Tokenized Model
Parameters
Paper
•
2410.23168
•
Published
•
24
Augmenting Self-attention with Persistent Memory
Paper
•
1907.01470
•
Published
•
1