5 25 76

neuralink

AI & ML interests

None yet

Recent Activity

new activity 1 day ago

nanotron/ultrascale-playbook:xrsrke/link_nanotron_fp8_appexdix

new activity 1 day ago

nanotron/ultrascale-playbook:xrsrke/fix_width_height_for_fp8_graph

updated a Space 1 day ago

nanotron/ultrascale-playbook

View all activity

Organizations

neuralink's activity

New activity in nanotron/ultrascale-playbook 1 day ago

xrsrke/link_nanotron_fp8_appexdix

#21 opened 2 days ago by

neuralink

xrsrke/fix_width_height_for_fp8_graph

#46 opened 1 day ago by

neuralink

updated a Space 1 day ago

139

The Ultra-Scale Playbook

🌌

The ultimate guide to training LLM on large GPU Clusters

New activity in nanotron/ultrascale-playbook 1 day ago

xrsrke/add_interactive_fp8_loss_curve

#43 opened 1 day ago by

neuralink

upvoted an article 14 days ago

Article

Open-R1: a fully open reproduction of DeepSeek-R1

23 days ago

• 761

upvoted an article 15 days ago

Article

Open-R1: Update #1

and 7 others •

18 days ago

• 283

upvoted 2 papers about 1 month ago

Domino: Eliminating Communication in LLM Training via Generic Tensor Slicing and Overlapping

Paper • 2409.15241 • Published Sep 23, 2024 • 1

Scaling Laws for Floating Point Quantization Training

Paper • 2501.02423 • Published Jan 5 • 26

liked 2 Spaces 2 months ago

Scaling With Vocab Demo

📊

Predict optimal vocabulary size based on model parameters

Harm Space

⚡

liked a model 2 months ago

tencent/Tencent-Hunyuan-Large

Text Generation • Updated Jan 19 • 471 • 564

upvoted a paper 3 months ago

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Paper • 2404.14219 • Published Apr 22, 2024 • 256

reacted to ArthurZ's post with 🔥 3 months ago

Post

3273

Native tensor parallel has landed in transformers!!! https://github.com/huggingface/transformers/pull/34184 thanks a lot to the torch team for their support!

Contributions are welcome to support more models! 🔥

liked a model 5 months ago

meta-llama/Llama-3.2-11B-Vision

Image-Text-to-Text • Updated Sep 27, 2024 • 225k • 469

updated 2 models 5 months ago

nanotron/temp_for_pr_review

Updated Sep 24, 2024

nanotron/fp8_for_nanotron

Updated Sep 21, 2024

upvoted a paper 5 months ago

Small-scale proxies for large-scale Transformer training instabilities

Paper • 2309.14322 • Published Sep 25, 2023 • 20

upvoted an article 6 months ago

Article

How NuminaMath Won the 1st AIMO Progress Prize

Jul 11, 2024

• 116

upvoted a paper 6 months ago

Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets

Paper • 2201.02177 • Published Jan 6, 2022 • 2

upvoted an article 6 months ago

Article

A failed experiment: Infini-Attention, and why we should keep trying?

Aug 14, 2024

• 58