Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2310.16795

QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models

Paper • 2310.16795 • Published Oct 25, 2023 • 26

QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models

Paper • 2310.16795 • Published Oct 25, 2023 • 26
Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference

Paper • 2308.12066 • Published Aug 23, 2023 • 4
Towards MoE Deployment: Mitigating Inefficiencies in Mixture-of-Expert (MoE) Inference

Paper • 2303.06182 • Published Mar 10, 2023 • 1
EvoMoE: An Evolutional Mixture-of-Experts Training Framework via Dense-To-Sparse Gate

Paper • 2112.14397 • Published Dec 29, 2021 • 1

LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models

Paper • 2310.08659 • Published Oct 12, 2023 • 22
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models

Paper • 2309.14717 • Published Sep 26, 2023 • 44
Norm Tweaking: High-performance Low-bit Quantization of Large Language Models

Paper • 2309.02784 • Published Sep 6, 2023 • 1
ModuLoRA: Finetuning 3-Bit LLMs on Consumer GPUs by Integrating with Modular Quantizers

Paper • 2309.16119 • Published Sep 28, 2023 • 1

BitNet: Scaling 1-bit Transformers for Large Language Models

Paper • 2310.11453 • Published Oct 17, 2023 • 96
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

Paper • 2310.11511 • Published Oct 17, 2023 • 74
In-Context Learning Creates Task Vectors

Paper • 2310.15916 • Published Oct 24, 2023 • 41
Matryoshka Diffusion Models

Paper • 2310.15111 • Published Oct 23, 2023 • 40

Approximating Two-Layer Feedforward Networks for Efficient Transformers

Paper • 2310.10837 • Published Oct 16, 2023 • 10
BitNet: Scaling 1-bit Transformers for Large Language Models

Paper • 2310.11453 • Published Oct 17, 2023 • 96
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models

Paper • 2310.16795 • Published Oct 25, 2023 • 26
LLM-FP4: 4-Bit Floating-Point Quantized Transformers

Paper • 2310.16836 • Published Oct 25, 2023 • 13

interesting stuff

Chain-of-Verification Reduces Hallucination in Large Language Models

Paper • 2309.11495 • Published Sep 20, 2023 • 38
Adapting Large Language Models via Reading Comprehension

Paper • 2309.09530 • Published Sep 18, 2023 • 75
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages

Paper • 2309.09400 • Published Sep 17, 2023 • 82
Language Modeling Is Compression

Paper • 2309.10668 • Published Sep 19, 2023 • 82

Agents: An Open-source Framework for Autonomous Language Agents

Paper • 2309.07870 • Published Sep 14, 2023 • 41
Clinical Text Summarization: Adapting Large Language Models Can Outperform Human Experts

Paper • 2309.07430 • Published Sep 14, 2023 • 27
Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers

Paper • 2309.08532 • Published Sep 15, 2023 • 52
Investigating Answerability of LLMs for Long-Form Question Answering

Paper • 2309.08210 • Published Sep 15, 2023 • 12

Previous
1
2
Next

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs