Approximating Two-Layer Feedforward Networks for Efficient Transformers Paper • 2310.10837 • Published Oct 16, 2023 • 10
BitNet: Scaling 1-bit Transformers for Large Language Models Paper • 2310.11453 • Published Oct 17, 2023 • 94
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models Paper • 2310.16795 • Published Oct 25, 2023 • 26
LLM-FP4: 4-Bit Floating-Point Quantized Transformers Paper • 2310.16836 • Published Oct 25, 2023 • 10
Atom: Low-bit Quantization for Efficient and Accurate LLM Serving Paper • 2310.19102 • Published Oct 29, 2023 • 7
Mini-GPTs: Efficient Large Language Models through Contextual Pruning Paper • 2312.12682 • Published Dec 20, 2023 • 7
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling Paper • 2312.15166 • Published Dec 23, 2023 • 55
SliceGPT: Compress Large Language Models by Deleting Rows and Columns Paper • 2401.15024 • Published Jan 26 • 62
Specialized Language Models with Cheap Inference from Limited Domain Data Paper • 2402.01093 • Published Feb 2 • 45
Rethinking Optimization and Architecture for Tiny Language Models Paper • 2402.02791 • Published Feb 5 • 12
Scaling Laws for Downstream Task Performance of Large Language Models Paper • 2402.04177 • Published Feb 6 • 16