Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2403.08295

Papers - Training Research

Measuring the Effects of Data Parallelism on Neural Network Training

Paper • 1811.03600 • Published Nov 8, 2018 • 2
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost

Paper • 1804.04235 • Published Apr 11, 2018 • 2
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

Paper • 1905.11946 • Published May 28, 2019 • 3
Yi: Open Foundation Models by 01.AI

Paper • 2403.04652 • Published Mar 7 • 59

LLM_architectures

Nemotron-4 15B Technical Report

Paper • 2402.16819 • Published Feb 26 • 40
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

Paper • 2402.19427 • Published Feb 29 • 50
RWKV: Reinventing RNNs for the Transformer Era

Paper • 2305.13048 • Published May 22, 2023 • 10
Reformer: The Efficient Transformer

Paper • 2001.04451 • Published Jan 13, 2020

Daily paper that worth reading in details later

Neural Network Diffusion

Paper • 2402.13144 • Published Feb 20 • 93
Genie: Generative Interactive Environments

Paper • 2402.15391 • Published Feb 23 • 67
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

Paper • 2402.17177 • Published Feb 27 • 87
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks

Paper • 2403.00522 • Published Mar 1 • 40

Rethinking Optimization and Architecture for Tiny Language Models

Paper • 2402.02791 • Published Feb 5 • 12
More Agents Is All You Need

Paper • 2402.05120 • Published Feb 3 • 46
Scaling Laws for Forgetting When Fine-Tuning Large Language Models

Paper • 2401.05605 • Published Jan 11
Aligning Large Language Models with Counterfactual DPO

Paper • 2401.09566 • Published Jan 17 • 2

LoRA: Low-Rank Adaptation of Large Language Models

Paper • 2106.09685 • Published Jun 17, 2021 • 24
Instruct-Imagen: Image Generation with Multi-modal Instruction

Paper • 2401.01952 • Published Jan 3 • 29
mistralai/Mixtral-8x7B-Instruct-v0.1

Text Generation • Updated Feb 29 • 423k • 3.86k
Gemma: Open Models Based on Gemini Research and Technology

Paper • 2403.08295 • Published Mar 13 • 43

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 36
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 11
RoBERTa: A Robustly Optimized BERT Pretraining Approach

Paper • 1907.11692 • Published Jul 26, 2019 • 7
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

Paper • 1910.01108 • Published Oct 2, 2019 • 11

SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling

Paper • 2312.15166 • Published Dec 23, 2023 • 55
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU

Paper • 2312.12456 • Published Dec 16, 2023 • 40
Cached Transformers: Improving Transformers with Differentiable Memory Cache

Paper • 2312.12742 • Published Dec 20, 2023 • 11
Mini-GPTs: Efficient Large Language Models through Contextual Pruning

Paper • 2312.12682 • Published Dec 20, 2023 • 7

Previous
1
2
Next

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs