Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2401.04088

abacusai/Smaug-72B-v0.1

Text Generation • Updated Feb 23 • 3.72k • 458
Running on A10G

728

📚

ReplaceAnything
miqudev/miqu-1-70b

Updated Feb 4 • 16.1k • 973
fka/awesome-chatgpt-prompts

Viewer • Updated Mar 7, 2023 • 153 • 6.7k • 5.01k

Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18 • 135
Mixtral of Experts

Paper • 2401.04088 • Published Jan 8 • 154
Understanding LLMs: A Comprehensive Overview from Training to Inference

Paper • 2401.02038 • Published Jan 4 • 60

MambaByte: Token-free Selective State Space Model

Paper • 2401.13660 • Published Jan 24 • 47
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads

Paper • 2401.10774 • Published Jan 19 • 50
Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18 • 135
Meta-Prompting: Enhancing Language Models with Task-Agnostic Scaffolding

Paper • 2401.12954 • Published Jan 23 • 28

Jump Cut Smoothing for Talking Heads

Paper • 2401.04718 • Published Jan 9 • 16
Mixtral of Experts

Paper • 2401.04088 • Published Jan 8 • 154

about 19 hours ago

Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18 • 135
ReFT: Reasoning with Reinforced Fine-Tuning

Paper • 2401.08967 • Published Jan 17 • 27
Tuning Language Models by Proxy

Paper • 2401.08565 • Published Jan 16 • 19
TrustLLM: Trustworthiness in Large Language Models

Paper • 2401.05561 • Published Jan 10 • 62

mixture-of-experts

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

Paper • 1701.06538 • Published Jan 23, 2017 • 4
Sparse Networks from Scratch: Faster Training without Losing Performance

Paper • 1907.04840 • Published Jul 10, 2019 • 3
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models

Paper • 1910.02054 • Published Oct 4, 2019 • 3
A Mixture of h-1 Heads is Better than h Heads

Paper • 2005.06537 • Published May 13, 2020 • 2

DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

Paper • 2401.06066 • Published Jan 11 • 36
Mixtral of Experts

Paper • 2401.04088 • Published Jan 8 • 154
Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM

Paper • 2401.02994 • Published Jan 4 • 45
LLM Augmented LLMs: Expanding Capabilities through Composition

Paper • 2401.02412 • Published Jan 4 • 35

llm-paper-reading

LLM in a flash: Efficient Large Language Model Inference with Limited Memory

Paper • 2312.11514 • Published Dec 12, 2023 • 255
Magicoder: Source Code Is All You Need

Paper • 2312.02120 • Published Dec 4, 2023 • 78
Mixtral of Experts

Paper • 2401.04088 • Published Jan 8 • 154
Chain-of-Thought Reasoning Without Prompting

Paper • 2402.10200 • Published Feb 15 • 91

TOFU: A Task of Fictitious Unlearning for LLMs

Paper • 2401.06121 • Published Jan 11 • 14
Secrets of RLHF in Large Language Models Part II: Reward Modeling

Paper • 2401.06080 • Published Jan 11 • 23
Mixtral of Experts

Paper • 2401.04088 • Published Jan 8 • 154

Mixtral of Experts

Paper • 2401.04088 • Published Jan 8 • 154

Previous
1
2
3
4
...
6
Next

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs