Spaces:
Running
Running
add ressources 1
Browse filesBunch of ressources that i like, will create a discussion tab to encourage people to suggest the best ressources when we release the blog
- src/index.html +51 -1
src/index.html
CHANGED
@@ -2380,6 +2380,11 @@
|
|
2380 |
<a href="https://arxiv.org/abs/2312.11805"><strong>Gemini</strong></a>
|
2381 |
<p>Presents Google's multimodal model architecture capable of processing text, images, audio, and video inputs.</p>
|
2382 |
</div>
|
|
|
|
|
|
|
|
|
|
|
2383 |
|
2384 |
<div>
|
2385 |
<a href="https://arxiv.org/abs/2412.19437v1"><strong>DeepSeek-V3</strong></a>
|
@@ -2388,7 +2393,6 @@
|
|
2388 |
|
2389 |
|
2390 |
<h3>Training Frameworks</h3>
|
2391 |
-
|
2392 |
<div>
|
2393 |
<a href="https://github.com/facebookresearch/fairscale/tree/main"><strong>FairScale</strong></a>
|
2394 |
<p>PyTorch extension library for large-scale training, offering various parallelism and optimization techniques.</p>
|
@@ -2441,6 +2445,11 @@
|
|
2441 |
<p>Comprehensive guide to understanding and optimizing GPU memory usage in PyTorch.</p>
|
2442 |
</div>
|
2443 |
|
|
|
|
|
|
|
|
|
|
|
2444 |
<div>
|
2445 |
<a href="https://pytorch.org/tutorials/intermediate/tensorboard_profiler_tutorial.html"><strong>TensorBoard Profiler Tutorial</strong></a>
|
2446 |
<p>Guide to using TensorBoard's profiling tools for PyTorch models.</p>
|
@@ -2502,6 +2511,11 @@
|
|
2502 |
<a href="https://arxiv.org/abs/1710.03740"><strong>Mixed precision training</strong></a>
|
2503 |
<p>Introduces mixed precision training techniques for deep learning models.</p>
|
2504 |
</div>
|
|
|
|
|
|
|
|
|
|
|
2505 |
|
2506 |
<h3>Hardware</h3>
|
2507 |
|
@@ -2519,6 +2533,11 @@
|
|
2519 |
<a href="https://www.semianalysis.com/p/100000-h100-clusters-power-network"><strong>Semianalysis - 100k H100 cluster</strong></a>
|
2520 |
<p>Analysis of large-scale H100 GPU clusters and their implications for AI infrastructure.</p>
|
2521 |
</div>
|
|
|
|
|
|
|
|
|
|
|
2522 |
|
2523 |
<h3>Others</h3>
|
2524 |
|
@@ -2546,6 +2565,37 @@
|
|
2546 |
<a href="https://www.harmdevries.com/post/context-length/"><strong>Harm's blog for long context</strong></a>
|
2547 |
<p>Investigation into long context training in terms of data and training cost.</p>
|
2548 |
</div>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2549 |
|
2550 |
<h2>Appendix</h2>
|
2551 |
|
|
|
2380 |
<a href="https://arxiv.org/abs/2312.11805"><strong>Gemini</strong></a>
|
2381 |
<p>Presents Google's multimodal model architecture capable of processing text, images, audio, and video inputs.</p>
|
2382 |
</div>
|
2383 |
+
|
2384 |
+
<div>
|
2385 |
+
<a href="https://arxiv.org/abs/2407.21783"><strong>Llama 3</strong></a>
|
2386 |
+
<p>The Llama 3 Herd of Models</p>
|
2387 |
+
</div>
|
2388 |
|
2389 |
<div>
|
2390 |
<a href="https://arxiv.org/abs/2412.19437v1"><strong>DeepSeek-V3</strong></a>
|
|
|
2393 |
|
2394 |
|
2395 |
<h3>Training Frameworks</h3>
|
|
|
2396 |
<div>
|
2397 |
<a href="https://github.com/facebookresearch/fairscale/tree/main"><strong>FairScale</strong></a>
|
2398 |
<p>PyTorch extension library for large-scale training, offering various parallelism and optimization techniques.</p>
|
|
|
2445 |
<p>Comprehensive guide to understanding and optimizing GPU memory usage in PyTorch.</p>
|
2446 |
</div>
|
2447 |
|
2448 |
+
<div>
|
2449 |
+
<a href="https://huggingface.co/blog/train_memory"><strong>Memory profiling walkthrough on a simple example</strong></a>
|
2450 |
+
<p>Visualize and understand GPU memory in PyTorch.</p>
|
2451 |
+
</div>
|
2452 |
+
|
2453 |
<div>
|
2454 |
<a href="https://pytorch.org/tutorials/intermediate/tensorboard_profiler_tutorial.html"><strong>TensorBoard Profiler Tutorial</strong></a>
|
2455 |
<p>Guide to using TensorBoard's profiling tools for PyTorch models.</p>
|
|
|
2511 |
<a href="https://arxiv.org/abs/1710.03740"><strong>Mixed precision training</strong></a>
|
2512 |
<p>Introduces mixed precision training techniques for deep learning models.</p>
|
2513 |
</div>
|
2514 |
+
|
2515 |
+
<div>
|
2516 |
+
<a href="https://main-horse.github.io/posts/visualizing-6d/"><strong>@main_horse blog</strong></a>
|
2517 |
+
<p>Visualizing 6D Mesh Parallelism</p>
|
2518 |
+
</div>
|
2519 |
|
2520 |
<h3>Hardware</h3>
|
2521 |
|
|
|
2533 |
<a href="https://www.semianalysis.com/p/100000-h100-clusters-power-network"><strong>Semianalysis - 100k H100 cluster</strong></a>
|
2534 |
<p>Analysis of large-scale H100 GPU clusters and their implications for AI infrastructure.</p>
|
2535 |
</div>
|
2536 |
+
|
2537 |
+
<div>
|
2538 |
+
<a href="https://modal.com/gpu-glossary/readme"><strong>Modal GPU Glossary </strong></a>
|
2539 |
+
<p>CUDA docs for human</p>
|
2540 |
+
</div>
|
2541 |
|
2542 |
<h3>Others</h3>
|
2543 |
|
|
|
2565 |
<a href="https://www.harmdevries.com/post/context-length/"><strong>Harm's blog for long context</strong></a>
|
2566 |
<p>Investigation into long context training in terms of data and training cost.</p>
|
2567 |
</div>
|
2568 |
+
|
2569 |
+
<div>
|
2570 |
+
<a href="https://www.youtube.com/@GPUMODE/videos"><strong>GPU Mode</strong></a>
|
2571 |
+
<p>A GPU reading group and community.</p>
|
2572 |
+
</div>
|
2573 |
+
|
2574 |
+
<div>
|
2575 |
+
<a href="https://youtube.com/playlist?list=PLvtrkEledFjqOLuDB_9FWL3dgivYqc6-3&si=fKWPotx8BflLAUkf"><strong>EleutherAI Youtube channel</strong></a>
|
2576 |
+
<p>ML Scalability & Performance Reading Group</p>
|
2577 |
+
</div>
|
2578 |
+
|
2579 |
+
<div>
|
2580 |
+
<a href="https://jax-ml.github.io/scaling-book/"><strong>Google Jax Scaling book</strong></a>
|
2581 |
+
<p>How to Scale Your Model</p>
|
2582 |
+
</div>
|
2583 |
+
|
2584 |
+
<div>
|
2585 |
+
<a href="https://github.com/facebookresearch/capi/blob/main/fsdp.py"><strong>@fvsmassa & @TimDarcet FSDP</strong></a>
|
2586 |
+
<p>Standalone ~500 LoC FSDP implementation</p>
|
2587 |
+
</div>
|
2588 |
+
|
2589 |
+
<div>
|
2590 |
+
<a href="https://www.thonking.ai/"><strong>thonking.ai</strong></a>
|
2591 |
+
<p>Some of Horace He blogpost</p>
|
2592 |
+
</div>
|
2593 |
+
|
2594 |
+
<div>
|
2595 |
+
<a href="https://gordicaleksa.medium.com/eli5-flash-attention-5c44017022ad"><strong>Aleksa's ELI5 Flash Attention</strong></a>
|
2596 |
+
<p>Easy explanation of Flash Attention</p>
|
2597 |
+
</div>
|
2598 |
+
|
2599 |
|
2600 |
<h2>Appendix</h2>
|
2601 |
|