Unsupervised Learning of Sentence Embeddings using Compositional n-Gram Features Paper • 1703.02507 • Published Mar 7, 2017
DoGE: Domain Reweighting with Generalization Estimation Paper • 2310.15393 • Published Oct 23, 2023 • 1
DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging Paper • 2402.02622 • Published Feb 4, 2024 • 3
Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations Paper • 2405.18392 • Published May 28, 2024 • 12
MEDITRON-70B: Scaling Medical Pretraining for Large Language Models Paper • 2311.16079 • Published Nov 27, 2023 • 20
DoGE: Domain Reweighting with Generalization Estimation Paper • 2310.15393 • Published Oct 23, 2023 • 1
Faster Causal Attention Over Large Sequences Through Sparse Flash Attention Paper • 2306.01160 • Published Jun 1, 2023 • 1