RachidAR
's Collections
Ternary LLMs & Knowledge distillation & SOTA
updated
Addition is All You Need for Energy-efficient Language Models
Paper
•
2410.00907
•
Published
•
145
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper
•
2402.17764
•
Published
•
608
LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding
Paper
•
2404.16710
•
Published
•
77
Beyond Scaling Laws: Understanding Transformer Performance with
Associative Memory
Paper
•
2405.08707
•
Published
•
31
Token-Scaled Logit Distillation for Ternary Weight Generative Language
Models
Paper
•
2308.06744
•
Published
•
1
TerDiT: Ternary Diffusion Models with Transformers
Paper
•
2405.14854
•
Published
•
2
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
Paper
•
2405.12981
•
Published
•
30
You Only Cache Once: Decoder-Decoder Architectures for Language Models
Paper
•
2405.05254
•
Published
•
10
Paper
•
2410.05258
•
Published
•
171
BitNet a4.8: 4-bit Activations for 1-bit LLMs
Paper
•
2411.04965
•
Published
•
66