83 8 169

t.d.a.g. PRO

sequelbox

sequelbox.bsky.social

AI & ML interests

open source, infinite games. (they/them)

Recent Activity

posted an update 6 days ago

EARLY RELEASE PREVIEW of Esper 3 for Qwen 3 8b! - Reasoning finetune focused on coding, architecture, DevOps, and general reasoning - Trained using DeepSeek-R1 685b synthetic data - Official Apache 2.0 release coming soon on Valiant Labs: try out the preview for now and see what you think! Try it out: https://huggingface.co/sequelbox/Qwen3-8B-Esper3-PREVIEW with my love, allegra

liked a model 6 days ago

sequelbox/Qwen3-8B-Esper3-PREVIEW

updated a model 6 days ago

sequelbox/Qwen3-8B-Esper3-PREVIEW

View all activity

Organizations

sequelbox's activity

posted an update 6 days ago

Post

281

EARLY RELEASE PREVIEW of Esper 3 for Qwen 3 8b!

- Reasoning finetune focused on coding, architecture, DevOps, and general reasoning
- Trained using DeepSeek-R1 685b synthetic data
- Official Apache 2.0 release coming soon on Valiant Labs: try out the preview for now and see what you think!

Try it out: sequelbox/Qwen3-8B-Esper3-PREVIEW

with my love,
allegra

posted an update 29 days ago

Post

1735

TITANIUM 2 Deepseek-R1 dataset is here! Open-source synthetic architecture and DevOps dataset: sequelbox/Titanium2-DeepSeek-R1

Esper 3 will be coming out soon for multiple base models, trained on Titanium, Raiden, and more :)

with my love,
allegra

reacted to mkurman's post with ❤️ 2 months ago

Post

3697

Introducing a new architecture, MedIT One – a single-token transformer with LSTM-like recurrence.

It is extremely fast in training and inference, but we lack funding for large-scale training. Enjoy 🍓

https://github.com/MedITSolutionsKurman/medit-one

reacted to singhsidhukuldeep's post with 👍 2 months ago

Post

6888

Exciting New Tool for Knowledge Graph Extraction from Plain Text!

I just came across a groundbreaking new tool called KGGen that's solving a major challenge in the AI world - the scarcity of high-quality knowledge graph data.

KGGen is an open-source Python package that leverages language models to extract knowledge graphs (KGs) from plain text. What makes it special is its innovative approach to clustering related entities, which significantly reduces sparsity in the extracted KGs.

The technical approach is fascinating:

1. KGGen uses a multi-stage process involving an LLM (GPT-4o in their implementation) to extract entities and relations from source text
2. It aggregates graphs across sources to reduce redundancy
3. Most importantly, it applies iterative LM-based clustering to refine the raw graph

The clustering stage is particularly innovative - it identifies which nodes and edges refer to the same underlying entities or concepts. This normalizes variations in tense, plurality, stemming, and capitalization (e.g., "labors" clustered with "labor").

The researchers from Stanford and University of Toronto also introduced MINE (Measure of Information in Nodes and Edges), the first benchmark for evaluating KG extractors. When tested against existing methods like OpenIE and GraphRAG, KGGen outperformed them by up to 18%.

For anyone working with knowledge graphs, RAG systems, or KG embeddings, this tool addresses the fundamental challenge of data scarcity that's been holding back progress in graph-based foundation models.

The package is available via pip install kg-gen, making it accessible to everyone. This could be a game-changer for knowledge graph applications!

posted an update 3 months ago

Post

2823

Raiden is here! 63k creative-reasoning and analytic-reasoning prompts answered by DeepSeek's 685b R1 model!

- All prompts from microsoft/orca-agentinstruct-1M-v1 and all responses from deepseek-ai/DeepSeek-R1
- A deep look at R1's reasoning skills! Use as you will.

Get it now: sequelbox/Raiden-DeepSeek-R1

for everyone :)

reacted to rubenroy's post with 🚀 3 months ago

Post

2565

🔥🚀 Hey everyone! I'm excited to share my latest LLM release: Gilgamesh 72B, a model built on Qwen 2.5-72B Instruct. Gilgamesh was trained on a couple of my GammaCorpus datasets, specifically:

- rubenroy/GammaCorpus-CoT-Math-170k
- rubenroy/GammaCorpus-v2-5m
- rubenroy/GammaCorpus-Fact-QA-450k

I've submitted GGM 72B to the Open LLM Leaderboard for benchmarking, I'll send an update post once the results are in!

You can try it out and share your feedback, check out the model page and see what it can do:
👉 rubenroy/Gilgamesh-72B

Would love to hear your thoughts!

posted an update 3 months ago

Post

1948

New sneak preview of my next release! Raiden is a deepseek-ai/DeepSeek-R1 synthetic dataset that uses creative-reasoning and analytic-reasoning prompts!

This preview release has the first 5.8k rows, all responses generated using DeepSeek's 685b parameter R1 model: https://huggingface.co/datasets/sequelbox/Raiden-DSR1-PREVIEW

Enjoy this look at R1's reasoning skills! Full dataset coming soon.

reacted to victor's post with 🚀 3 months ago

Post

3140

Finally, an open-source AI that turns your lyrics into full songs is here—meet YuE! Unlike other tools that only create short clips, YuE can make entire songs (up to 5 minutes) with vocals, melody, and instruments all working together. Letsss go!

m-a-p/YuE-s1-7B-anneal-en-cot

posted an update 3 months ago

Post

2361

A general FYI that Valiant Labs no longer has an X account. This is a business decision. Many other businesses seem to be making the same decision right now.

You can follow my account on Bluesky for updates on Shining Valiant 3, other Valiant Labs models, my open-source datasets, etc: https://bsky.app/profile/sequelbox.bsky.social

back to building :)

posted an update 4 months ago

Post

1400

NEW RELEASE: the sequelbox/Tachibana-QVQ dataset is here! Code-reasoning and code-instruct data generated with Qwen/QVQ-72B-Preview

Come check out QVQ's coding skills!

for everyone to use!

more QVQ and Llama 3.1 405b datasets coming soon :)

reacted to DawnC's post with ❤️ 4 months ago

Post

2332

🌟 PawMatchAI: Making Breed Selection More Intuitive! 🐕
Excited to share the latest update to this AI-powered companion for finding your perfect furry friend! I've made significant architectural improvements to enhance breed recognition accuracy and feature detection.

✨ What's New?
Enhanced breed recognition through advanced morphological feature analysis:
- Implemented a sophisticated feature extraction system that analyzes specific characteristics like body proportions, head features, tail structure, fur texture, and color patterns
- Added an intelligent attention mechanism that dynamically focuses on the most relevant features for each image
- Improved multi-dog detection capabilities through enhanced spatial feature analysis
- Achieved better precision in distinguishing subtle breed characteristics

🎯 Key Features:
Smart breed recognition powered by advanced AI architecture
Visual matching scores with intuitive color indicators
Detailed breed comparisons with interactive tooltips
Lifestyle-based recommendations tailored to your needs

💭 Project Vision
Combining my passion for AI and pets, this project represents another step toward creating meaningful AI applications. Each update aims to make the breed selection process more accessible while improving the underlying technology.

👉 Try it now: DawnC/PawMatchAI

Your likes ❤️ on this space fuel this project's growth!

#AI #MachineLearning #DeepLearning #Pytorch #ComputerVision #TechForLife

2 replies

posted an update 4 months ago

Post

2173

Check out the early preview of the upcoming Tachibana-QVQ dataset: code-reasoning and code-instruct data generated with Qwen/QVQ-72B-Preview

Link here: sequelbox/Tachibana-QVQ-PREVIEW

more to come :)

1 reply

reacted to m-ric's post with 👀 5 months ago

Post

2597

𝐇𝐮𝐠𝐠𝐢𝐧𝐠 𝐅𝐚𝐜𝐞 𝐫𝐞𝐥𝐞𝐚𝐬𝐞𝐬 𝐏𝐢𝐜𝐨𝐭𝐫𝐨𝐧, 𝐚 𝐦𝐢𝐜𝐫𝐨𝐬𝐜𝐨𝐩𝐢𝐜 𝐥𝐢𝐛 𝐭𝐡𝐚𝐭 𝐬𝐨𝐥𝐯𝐞𝐬 𝐋𝐋𝐌 𝐭𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝟒𝐃 𝐩𝐚𝐫𝐚𝐥𝐥𝐞𝐥𝐢𝐳𝐚𝐭𝐢𝐨𝐧 🥳

🕰️ Llama-3.1-405B took 39 million GPU-hours to train, i.e. about 4.5 thousand years.

👴🏻 If they had needed all this time, we would have GPU stories from the time of Pharaoh 𓂀: "Alas, Lord of Two Lands, the shipment of counting-stones arriving from Cathay was lost to pirates, this shall delay the building of your computing temple by many moons "

🛠️ But instead, they just parallelized the training on 24k H100s, which made it take just a few months.
This required parallelizing across 4 dimensions: data, tensor, context, pipeline.
And it is infamously hard to do, making for bloated code repos that hold together only by magic.

🤏 𝗕𝘂𝘁 𝗻𝗼𝘄 𝘄𝗲 𝗱𝗼𝗻'𝘁 𝗻𝗲𝗲𝗱 𝗵𝘂𝗴𝗲 𝗿𝗲𝗽𝗼𝘀 𝗮𝗻𝘆𝗺𝗼𝗿𝗲! Instead of building mega-training codes, Hugging Face colleagues cooked in the other direction, towards tiny 4D parallelism libs. A team has built Nanotron, already widely used in industry.
And now a team releases Picotron, a radical approach to code 4D Parallelism in just a few hundred lines of code, a real engineering prowess, making it much easier to understand what's actually happening!

⚡ 𝗜𝘁'𝘀 𝘁𝗶𝗻𝘆, 𝘆𝗲𝘁 𝗽𝗼𝘄𝗲𝗿𝗳𝘂𝗹:
Counting in MFU (Model FLOPs Utilization, how much the model actually uses all the compute potential), this lib reaches ~50% on SmolLM-1.7B model with 8 H100 GPUs, which is really close to what huge libs would reach. (Caution: the team is leading further benchmarks to verify this)

Go take a look 👉 https://github.com/huggingface/picotron/tree/main/picotron

1 reply

reacted to takarajordan's post with ❤️ 5 months ago

Post

2483

I'm super excited to release my first open-source text dataset:

WorldScenario 20K is a novel dataset of 20,000 synthetically generated multi-stakeholder scenarios designed to simulate real-world decision-making processes. Each scenario explores a unique environmental, societal, or economic issue.

I used the brand new meta-llama/Llama-3.3-70B-Instruct model to generate this dataset and I put the dataset through some post processing to clean and evaluate the dataset for diversity.

I'd appreciate some feedback and thoughts on my new release! Thanks!

takarajordan/WorldScenario_20K

8 replies

posted an update 6 months ago

Post

1252

next version of sequelbox/Celestia will be microsoft/orca-agentinstruct-1M-v1 style. coming soon

1 reply

posted an update 7 months ago

Post

1243

NEW releases for today:

- We've brought our new Esper 2 model to Llama 3.2! The DevOps-first Esper finetunes use our newest open source datasets. Get the new Esper: ValiantLabs/Llama3.2-3B-Esper2
- Some new merged models, combining Shining Valiant 2 with the other Build Tools:
- sequelbox/Llama3.1-8B-PlumCode
- sequelbox/Llama3.1-8B-PlumChat
- sequelbox/Llama3.1-8B-PlumMath

more to come soon :)

1 reply

posted an update 7 months ago

Post

1503

Llama 3.2 3b + code-instruct: get our newest version of Enigma!

ValiantLabs/Llama3.2-3B-Enigma is trained on our high quality code-instruct ( sequelbox/Tachibana) and general chat ( sequelbox/Supernova) data.

try it now :) more models and datasets coming soon!

posted an update 7 months ago

Post

499

NEW RELEASE! We've brought Shining Valiant 2 to Llama 3.2!

ValiantLabs/Llama3.2-3B-ShiningValiant2 is trained on high-quality general chat and science-instruct data! Get it now :)

(Enigma's up next for 3b, that'll be out soon!)

Additionally, newly expanded versions of the following datasets are now available:

sequelbox/Supernova is now 178k rows of high-quality synthetic general chat data.
sequelbox/Tachibana is now 104k rows of high-quality synthetic code-instruct data.

for everyone to use :)
more soon

reacted to fdaudens's post with 🚀 7 months ago

Post

3396

🚀 1,000,000 public models milestone achieved on Hugging Face! 🤯

This chart by @cfahlgren1 shows the explosive growth of open-source AI. It's not just about numbers - it's a thriving community combining cutting-edge ML with real-world applications. cfahlgren1/hub-stats

Can't wait to see what's next!

2 replies

posted an update 8 months ago

Post

1402

Just released: our newest version of Shining Valiant, powered by an all-new science-instruct dataset!

The model: ValiantLabs/Llama3.1-8B-ShiningValiant2
The dataset: sequelbox/Celestia

tell your friends! for everyone to use.

(more releases coming later this week! including an expansion of the sequelbox/Supernova generalist dataset.)