Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2311.13171

MOE papers to read

Copied from MoE using https://huggingface.co/spaces/librarian-bots/collection_cloner.

QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models

Paper • 2310.16795 • Published Oct 25, 2023 • 26
Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference

Paper • 2308.12066 • Published Aug 23, 2023 • 4
Towards MoE Deployment: Mitigating Inefficiencies in Mixture-of-Expert (MoE) Inference

Paper • 2303.06182 • Published Mar 10, 2023 • 1
EvoMoE: An Evolutional Mixture-of-Experts Training Framework via Dense-To-Sparse Gate

Paper • 2112.14397 • Published Dec 29, 2021 • 1

Experts Weights Averaging: A New General Training Scheme for Vision Transformers

Paper • 2308.06093 • Published Aug 11, 2023 • 2
Platypus: Quick, Cheap, and Powerful Refinement of LLMs

Paper • 2308.07317 • Published Aug 14, 2023 • 23
Beyond Attentive Tokens: Incorporating Token Importance and Diversity for Efficient Vision Transformers

Paper • 2211.11315 • Published Nov 21, 2022 • 1
LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition

Paper • 2307.13269 • Published Jul 25, 2023 • 31

Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time

Paper • 2310.17157 • Published Oct 26, 2023 • 11
Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers

Paper • 2305.15805 • Published May 25, 2023 • 1
Compress, Then Prompt: Improving Accuracy-Efficiency Trade-off of LLM Inference with Transferable Prompt

Paper • 2305.11186 • Published May 17, 2023 • 1
Composable Sparse Fine-Tuning for Cross-Lingual Transfer

Paper • 2110.07560 • Published Oct 14, 2021 • 1

QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models

Paper • 2310.16795 • Published Oct 25, 2023 • 26
Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference

Paper • 2308.12066 • Published Aug 23, 2023 • 4
Towards MoE Deployment: Mitigating Inefficiencies in Mixture-of-Expert (MoE) Inference

Paper • 2303.06182 • Published Mar 10, 2023 • 1
EvoMoE: An Evolutional Mixture-of-Experts Training Framework via Dense-To-Sparse Gate

Paper • 2112.14397 • Published Dec 29, 2021 • 1

LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models

Paper • 2310.08659 • Published Oct 12, 2023 • 22
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models

Paper • 2309.14717 • Published Sep 26, 2023 • 44
ModuLoRA: Finetuning 3-Bit LLMs on Consumer GPUs by Integrating with Modular Quantizers

Paper • 2309.16119 • Published Sep 28, 2023 • 1
LoRA ensembles for large language model fine-tuning

Paper • 2310.00035 • Published Sep 29, 2023 • 2

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs