Communication-Efficient Language Model Training Scales Reliably and Robustly: Scaling Laws for DiLoCo Paper • 2503.09799 • Published 2 days ago • 8
Communication-Efficient Language Model Training Scales Reliably and Robustly: Scaling Laws for DiLoCo Paper • 2503.09799 • Published 2 days ago • 8 • 1
Streaming DiLoCo with overlapping communication: Towards a Distributed Free Lunch Paper • 2501.18512 • Published Jan 30 • 27 • 7
Eager Updates For Overlapped Communication and Computation in DiLoCo Paper • 2502.12996 • Published 24 days ago • 7
Eager Updates For Overlapped Communication and Computation in DiLoCo Paper • 2502.12996 • Published 24 days ago • 7
Eager Updates For Overlapped Communication and Computation in DiLoCo Paper • 2502.12996 • Published 24 days ago • 7 • 2
Streaming DiLoCo with overlapping communication: Towards a Distributed Free Lunch Paper • 2501.18512 • Published Jan 30 • 27 • 7
Streaming DiLoCo with overlapping communication: Towards a Distributed Free Lunch Paper • 2501.18512 • Published Jan 30 • 27
Streaming DiLoCo with overlapping communication: Towards a Distributed Free Lunch Paper • 2501.18512 • Published Jan 30 • 27 • 7
Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2 Paper • 2408.05147 • Published Aug 9, 2024 • 39
DiLoCo: Distributed Low-Communication Training of Language Models Paper • 2311.08105 • Published Nov 14, 2023 • 15
NEVIS'22: A Stream of 100 Tasks Sampled from 30 Years of Computer Vision Research Paper • 2211.11747 • Published Nov 15, 2022
PODNet: Pooled Outputs Distillation for Small-Tasks Incremental Learning Paper • 2004.13513 • Published Apr 28, 2020
WARP: On the Benefits of Weight Averaged Rewarded Policies Paper • 2406.16768 • Published Jun 24, 2024 • 23
WARP: On the Benefits of Weight Averaged Rewarded Policies Paper • 2406.16768 • Published Jun 24, 2024 • 23
Asynchronous Local-SGD Training for Language Modeling Paper • 2401.09135 • Published Jan 17, 2024 • 11
Asynchronous Local-SGD Training for Language Modeling Paper • 2401.09135 • Published Jan 17, 2024 • 11