Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
xiaolinz 's Collections
DeepSeek
DiLoCo

DiLoCo

updated Mar 26
Upvote
-

  • Streaming DiLoCo with overlapping communication: Towards a Distributed Free Lunch

    Paper • 2501.18512 • Published Jan 30 • 30

  • DiLoCo: Distributed Low-Communication Training of Language Models

    Paper • 2311.08105 • Published Nov 14, 2023 • 15

  • Communication-Efficient Language Model Training Scales Reliably and Robustly: Scaling Laws for DiLoCo

    Paper • 2503.09799 • Published Mar 12 • 14

  • Muon is Scalable for LLM Training

    Paper • 2502.16982 • Published Feb 24 • 1

  • (Mis)Fitting: A Survey of Scaling Laws

    Paper • 2502.18969 • Published Feb 26

  • Scaling Language Models: Methods, Analysis & Insights from Training Gopher

    Paper • 2112.11446 • Published Dec 8, 2021 • 1

  • Scaling Laws for Floating Point Quantization Training

    Paper • 2501.02423 • Published Jan 5 • 27

  • Survey on Evaluation of LLM-based Agents

    Paper • 2503.16416 • Published Mar 20 • 90
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs