t.d.a.g. PRO

sequelbox

AI & ML interests

open source, infinite games. (they/them)

Recent Activity

reacted to m-ric's post with ๐Ÿ‘€ 1 day ago
๐‡๐ฎ๐ ๐ ๐ข๐ง๐  ๐…๐š๐œ๐ž ๐ซ๐ž๐ฅ๐ž๐š๐ฌ๐ž๐ฌ ๐๐ข๐œ๐จ๐ญ๐ซ๐จ๐ง, ๐š ๐ฆ๐ข๐œ๐ซ๐จ๐ฌ๐œ๐จ๐ฉ๐ข๐œ ๐ฅ๐ข๐› ๐ญ๐ก๐š๐ญ ๐ฌ๐จ๐ฅ๐ฏ๐ž๐ฌ ๐‹๐‹๐Œ ๐ญ๐ซ๐š๐ข๐ง๐ข๐ง๐  ๐Ÿ’๐ƒ ๐ฉ๐š๐ซ๐š๐ฅ๐ฅ๐ž๐ฅ๐ข๐ณ๐š๐ญ๐ข๐จ๐ง ๐Ÿฅณ ๐Ÿ•ฐ๏ธ Llama-3.1-405B took 39 million GPU-hours to train, i.e. about 4.5 thousand years. ๐Ÿ‘ด๐Ÿป If they had needed all this time, we would have GPU stories from the time of Pharaoh ๐“‚€: "Alas, Lord of Two Lands, the shipment of counting-stones arriving from Cathay was lost to pirates, this shall delay the building of your computing temple by many moons " ๐Ÿ› ๏ธ But instead, they just parallelized the training on 24k H100s, which made it take just a few months. This required parallelizing across 4 dimensions: data, tensor, context, pipeline. And it is infamously hard to do, making for bloated code repos that hold together only by magic. ๐Ÿค ๐—•๐˜‚๐˜ ๐—ป๐—ผ๐˜„ ๐˜„๐—ฒ ๐—ฑ๐—ผ๐—ป'๐˜ ๐—ป๐—ฒ๐—ฒ๐—ฑ ๐—ต๐˜‚๐—ด๐—ฒ ๐—ฟ๐—ฒ๐—ฝ๐—ผ๐˜€ ๐—ฎ๐—ป๐˜†๐—บ๐—ผ๐—ฟ๐—ฒ! Instead of building mega-training codes, Hugging Face colleagues cooked in the other direction, towards tiny 4D parallelism libs. A team has built Nanotron, already widely used in industry. And now a team releases Picotron, a radical approach to code 4D Parallelism in just a few hundred lines of code, a real engineering prowess, making it much easier to understand what's actually happening! โšก ๐—œ๐˜'๐˜€ ๐˜๐—ถ๐—ป๐˜†, ๐˜†๐—ฒ๐˜ ๐—ฝ๐—ผ๐˜„๐—ฒ๐—ฟ๐—ณ๐˜‚๐—น: Counting in MFU (Model FLOPs Utilization, how much the model actually uses all the compute potential), this lib reaches ~50% on SmolLM-1.7B model with 8 H100 GPUs, which is really close to what huge libs would reach. (Caution: the team is leading further benchmarks to verify this) Go take a look ๐Ÿ‘‰ https://github.com/huggingface/picotron/tree/main/picotron
liked a dataset 9 days ago
jondurbin/airoboros-2.2
liked a dataset 9 days ago
microsoft/orca-math-word-problems-200k
View all activity

Organizations

Valiant Labs's profile picture

sequelbox's activity

reacted to m-ric's post with ๐Ÿ‘€ 1 day ago
view post
Post
1829
๐‡๐ฎ๐ ๐ ๐ข๐ง๐  ๐…๐š๐œ๐ž ๐ซ๐ž๐ฅ๐ž๐š๐ฌ๐ž๐ฌ ๐๐ข๐œ๐จ๐ญ๐ซ๐จ๐ง, ๐š ๐ฆ๐ข๐œ๐ซ๐จ๐ฌ๐œ๐จ๐ฉ๐ข๐œ ๐ฅ๐ข๐› ๐ญ๐ก๐š๐ญ ๐ฌ๐จ๐ฅ๐ฏ๐ž๐ฌ ๐‹๐‹๐Œ ๐ญ๐ซ๐š๐ข๐ง๐ข๐ง๐  ๐Ÿ’๐ƒ ๐ฉ๐š๐ซ๐š๐ฅ๐ฅ๐ž๐ฅ๐ข๐ณ๐š๐ญ๐ข๐จ๐ง ๐Ÿฅณ

๐Ÿ•ฐ๏ธ Llama-3.1-405B took 39 million GPU-hours to train, i.e. about 4.5 thousand years.

๐Ÿ‘ด๐Ÿป If they had needed all this time, we would have GPU stories from the time of Pharaoh ๐“‚€: "Alas, Lord of Two Lands, the shipment of counting-stones arriving from Cathay was lost to pirates, this shall delay the building of your computing temple by many moons "

๐Ÿ› ๏ธ But instead, they just parallelized the training on 24k H100s, which made it take just a few months.
This required parallelizing across 4 dimensions: data, tensor, context, pipeline.
And it is infamously hard to do, making for bloated code repos that hold together only by magic.

๐Ÿค ๐—•๐˜‚๐˜ ๐—ป๐—ผ๐˜„ ๐˜„๐—ฒ ๐—ฑ๐—ผ๐—ป'๐˜ ๐—ป๐—ฒ๐—ฒ๐—ฑ ๐—ต๐˜‚๐—ด๐—ฒ ๐—ฟ๐—ฒ๐—ฝ๐—ผ๐˜€ ๐—ฎ๐—ป๐˜†๐—บ๐—ผ๐—ฟ๐—ฒ! Instead of building mega-training codes, Hugging Face colleagues cooked in the other direction, towards tiny 4D parallelism libs. A team has built Nanotron, already widely used in industry.
And now a team releases Picotron, a radical approach to code 4D Parallelism in just a few hundred lines of code, a real engineering prowess, making it much easier to understand what's actually happening!

โšก ๐—œ๐˜'๐˜€ ๐˜๐—ถ๐—ป๐˜†, ๐˜†๐—ฒ๐˜ ๐—ฝ๐—ผ๐˜„๐—ฒ๐—ฟ๐—ณ๐˜‚๐—น:
Counting in MFU (Model FLOPs Utilization, how much the model actually uses all the compute potential), this lib reaches ~50% on SmolLM-1.7B model with 8 H100 GPUs, which is really close to what huge libs would reach. (Caution: the team is leading further benchmarks to verify this)

Go take a look ๐Ÿ‘‰ https://github.com/huggingface/picotron/tree/main/picotron
  • 1 reply
ยท
reacted to takarajordan's post with โค๏ธ 9 days ago
view post
Post
2155
I'm super excited to release my first open-source text dataset:

WorldScenario 20K is a novel dataset of 20,000 synthetically generated multi-stakeholder scenarios designed to simulate real-world decision-making processes. Each scenario explores a unique environmental, societal, or economic issue.

I used the brand new meta-llama/Llama-3.3-70B-Instruct model to generate this dataset and I put the dataset through some post processing to clean and evaluate the dataset for diversity.

I'd appreciate some feedback and thoughts on my new release! Thanks!

takarajordan/WorldScenario_20K
ยท
posted an update 17 days ago
view post
Post
430
NEW RELEASE: Celestia 2!

- Multi-turn science-instruct conversations in the microsoft/orca-agentinstruct-1M-v1 style, generated by meta-llama/Llama-3.1-405B-Instruct
- 100% challenging, multi-turn conversations focused on physics, chemistry, computer science, biology, Earth science, and more!

Celestia 2 will be one of the datasets used for training by the upcoming agent-instruct model, Shining Valiant 3. very excited for this :)

Get it now: sequelbox/Celestia2

do as you will. there is only the sea.
posted an update about 1 month ago
posted an update about 2 months ago
view post
Post
450
NEW RELEASE! Shining Valiant 2 for Llama 3.1 70b is here!

- Trained on high quality science-instruct, complex queries, and general chat data!
- Uses our newest datasets, ALL open-sourced for everyone to use!

GET SV2 70B: ValiantLabs/Llama3.1-70B-ShiningValiant2

- Find the SV datasets here, including the expanded version of our science-instruct dataset:
- sequelbox/Celestia
- sequelbox/Spurline
- sequelbox/Supernova
- SV2 8b and 3b will be updated with the new datasets soon!

Enjoy! :)
posted an update 3 months ago
replied to their post 3 months ago
posted an update 3 months ago
posted an update 3 months ago
posted an update 3 months ago
view post
Post
494
NEW RELEASE! We've brought Shining Valiant 2 to Llama 3.2!

ValiantLabs/Llama3.2-3B-ShiningValiant2 is trained on high-quality general chat and science-instruct data! Get it now :)

(Enigma's up next for 3b, that'll be out soon!)

Additionally, newly expanded versions of the following datasets are now available:

sequelbox/Supernova is now 178k rows of high-quality synthetic general chat data.
sequelbox/Tachibana is now 104k rows of high-quality synthetic code-instruct data.

for everyone to use :)
more soon
reacted to fdaudens's post with ๐Ÿš€ 3 months ago
view post
Post
3347
๐Ÿš€ 1,000,000 public models milestone achieved on Hugging Face! ๐Ÿคฏ

This chart by @cfahlgren1 shows the explosive growth of open-source AI. It's not just about numbers - it's a thriving community combining cutting-edge ML with real-world applications. cfahlgren1/hub-stats

Can't wait to see what's next!
  • 2 replies
ยท
posted an update 3 months ago
view post
Post
355
today's release: the updated Supernova general chat dataset!

- the new Supernova is 2x the rows, continuing to provide high quality general synthetic data generated with Llama 405b Instruct.

Find it at sequelbox/Supernova

Enjoy! There's also a new version of sequelbox/Llama3.1-8B-MOTH available using the new dataset. (new and better MOTHs for other models will come as well, but the Build Tools and Shining Valiant take priority.)
posted an update 3 months ago
replied to their post 3 months ago
view reply

coming next: newest version of Shining Valiant + the new science-instruct dataset that she'll be using as a knowledge base. very excited for this!
after that more build tool releases :)

posted an update 3 months ago
reacted to vilarin's post with ๐Ÿš€ 4 months ago
posted an update 4 months ago
posted an update 4 months ago
view post
Post
826
new synthetic general chat dataset! meet Supernova, a dataset using prompts from UltraFeedback and responses from Llama 3.1 405b Instruct: sequelbox/Supernova

new model(s) using the Supernova dataset will follow next week, along with Other Things. (One of these will be a newly updated version of Enigma, utilizing the next version of sequelbox/Tachibana with approximately 2x the rows!)
posted an update 4 months ago
reacted to qq8933's post with ๐Ÿš€ 4 months ago