Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2404.02258

Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Paper • 2404.02258 • Published Apr 2 • 102

Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Paper • 2404.02258 • Published Apr 2 • 102
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction

Paper • 2404.02905 • Published Apr 3 • 63

papers-efficiency

Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Paper • 2404.02258 • Published Apr 2 • 102
ReFT: Representation Finetuning for Language Models

Paper • 2404.03592 • Published Apr 4 • 85

LLM foundations

Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Paper • 2404.02258 • Published Apr 2 • 102
Textbooks Are All You Need

Paper • 2306.11644 • Published Jun 20, 2023 • 142
Jamba: A Hybrid Transformer-Mamba Language Model

Paper • 2403.19887 • Published Mar 28 • 102
Large Language Models Struggle to Learn Long-Tail Knowledge

Paper • 2211.08411 • Published Nov 15, 2022 • 3

transformer-techniques

Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Paper • 2404.02258 • Published Apr 2 • 102
Transformer-Lite: High-efficiency Deployment of Large Language Models on Mobile Phone GPUs

Paper • 2403.20041 • Published Mar 29 • 34
ViTAR: Vision Transformer with Any Resolution

Paper • 2403.18361 • Published Mar 27 • 51

Papers - MoD - Router

Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Paper • 2404.02258 • Published Apr 2 • 102

Papers - Mixture of Depths - MLP, residuals, router, tokens

Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Paper • 2404.02258 • Published Apr 2 • 102

Papers - Infra - Cost - Automatic Compute Planning

Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Paper • 2404.02258 • Published Apr 2 • 102

Localizing Paragraph Memorization in Language Models

Paper • 2403.19851 • Published Mar 28 • 13
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Paper • 2404.02258 • Published Apr 2 • 102

The Unreasonable Ineffectiveness of the Deeper Layers

Paper • 2403.17887 • Published Mar 26 • 77
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Paper • 2404.02258 • Published Apr 2 • 102
ReFT: Representation Finetuning for Language Models

Paper • 2404.03592 • Published Apr 4 • 85
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences

Paper • 2404.03715 • Published Apr 4 • 58

Previous
1
2
3
4
5
Next

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs