Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2403.03507

A collection of arXiv papers from Chip Huyen's AI Engineering organized by chapter and ordered by when each appears in the book.

Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning

Paper • 2211.04325 • Published Oct 26, 2022
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 17
On the Opportunities and Risks of Foundation Models

Paper • 2108.07258 • Published Aug 16, 2021
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks

Paper • 2204.07705 • Published Apr 16, 2022 • 1

R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization

Paper • 2503.12937 • Published 10 days ago • 26
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey

Paper • 2503.12605 • Published 10 days ago • 30
Transformer^2: Self-adaptive LLMs

Paper • 2501.06252 • Published Jan 9 • 54
s1: Simple test-time scaling

Paper • 2501.19393 • Published Jan 31 • 113

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Paper • 1910.10683 • Published Oct 23, 2019 • 11
AutoTrain: No-code training for state-of-the-art models

Paper • 2410.15735 • Published Oct 21, 2024 • 60
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report

Paper • 2405.00732 • Published Apr 29, 2024 • 121
LoRA: Low-Rank Adaptation of Large Language Models

Paper • 2106.09685 • Published Jun 17, 2021 • 37

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Paper • 2403.03507 • Published Mar 6, 2024 • 188

SciLitLLM: How to Adapt LLMs for Scientific Literature Understanding

Paper • 2408.15545 • Published Aug 28, 2024 • 36
Controllable Text Generation for Large Language Models: A Survey

Paper • 2408.12599 • Published Aug 22, 2024 • 65
To Code, or Not To Code? Exploring Impact of Code in Pre-training

Paper • 2408.10914 • Published Aug 20, 2024 • 42
Automated Design of Agentic Systems

Paper • 2408.08435 • Published Aug 15, 2024 • 39

aws-neuron/optimum-neuron-cache

Updated 12 days ago • 16
Groq/Llama-3-Groq-8B-Tool-Use

Text Generation • Updated Aug 27, 2024 • 2.57k • 274
microsoft/Florence-2-large

Image-Text-to-Text • Updated Dec 8, 2024 • 2.99M • 1.49k
xinsir/controlnet-union-sdxl-1.0

Text-to-Image • Updated Jul 30, 2024 • 138k • 1.36k

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Paper • 2312.00752 • Published Dec 1, 2023 • 143
Elucidating the Design Space of Diffusion-Based Generative Models

Paper • 2206.00364 • Published Jun 1, 2022 • 16
GLU Variants Improve Transformer

Paper • 2002.05202 • Published Feb 12, 2020 • 3
StarCoder 2 and The Stack v2: The Next Generation

Paper • 2402.19173 • Published Feb 29, 2024 • 138

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Paper • 2403.03507 • Published Mar 6, 2024 • 188
PERL: Parameter Efficient Reinforcement Learning from Human Feedback

Paper • 2403.10704 • Published Mar 15, 2024 • 59

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Paper • 2403.03507 • Published Mar 6, 2024 • 188

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Paper • 2403.03507 • Published Mar 6, 2024 • 188

Previous
1
2
3
...
6
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs