U-Net: Convolutional Networks for Biomedical Image Segmentation Paper • 1505.04597 • Published May 18, 2015 • 5
A decoder-only foundation model for time-series forecasting Paper • 2310.10688 • Published Oct 14, 2023 • 4
Mish: A Self Regularized Non-Monotonic Activation Function Paper • 1908.08681 • Published Aug 23, 2019 • 1
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training Paper • 2403.09611 • Published Mar 14 • 119
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases Paper • 2402.14905 • Published Feb 22 • 81
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Paper • 2402.17764 • Published Feb 27 • 566
LLaMA: Open and Efficient Foundation Language Models Paper • 2302.13971 • Published Feb 27, 2023 • 11
Learning Transferable Visual Models From Natural Language Supervision Paper • 2103.00020 • Published Feb 26, 2021 • 7
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity Paper • 2101.03961 • Published Jan 11, 2021 • 13
The Pile: An 800GB Dataset of Diverse Text for Language Modeling Paper • 2101.00027 • Published Dec 31, 2020 • 6
SQuAD: 100,000+ Questions for Machine Comprehension of Text Paper • 1606.05250 • Published Jun 16, 2016 • 3
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter Paper • 1910.01108 • Published Oct 2, 2019 • 10
RoBERTa: A Robustly Optimized BERT Pretraining Approach Paper • 1907.11692 • Published Jul 26, 2019 • 7
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Paper • 1810.04805 • Published Oct 11, 2018 • 11
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding Paper • 1804.07461 • Published Apr 20, 2018 • 4
HiPPO: Recurrent Memory with Optimal Polynomial Projections Paper • 2008.07669 • Published Aug 17, 2020 • 1
papers Collection a growing collection of arXiv papers read while learning ML 📜 • 32 items • Updated 3 days ago • 1
Transformers.js demos Collection A collection of my favorite WebML demos, built with Transformers.js! • 23 items • Updated 13 days ago • 35
AR-Net: A simple Auto-Regressive Neural Network for time-series Paper • 1911.12436 • Published Nov 27, 2019 • 1