
Bluesky Community
community
AI & ML interests
Tools for Bluesky 🦋
bluesky-community's activity

davanstrien
posted
an
update
9 days ago
Post
1583
I've created a v1 dataset (
davanstrien/reasoning-required) and model (
davanstrien/ModernBERT-based-Reasoning-Required) to help curate "wild text" data for generating reasoning examples beyond the usual code/math/science domains.
- I developed a "Reasoning Required" dataset with a 0-4 scoring system for reasoning complexity
- I used educational content from HuggingFaceFW/fineweb-edu, adding annotations for domains, reasoning types, and example questions
My approach enables a more efficient workflow: filter text with small models first, then use LLMs only on high-value content.
This significantly reduces computation costs while expanding reasoning dataset domain coverage.
- I developed a "Reasoning Required" dataset with a 0-4 scoring system for reasoning complexity
- I used educational content from HuggingFaceFW/fineweb-edu, adding annotations for domains, reasoning types, and example questions
My approach enables a more efficient workflow: filter text with small models first, then use LLMs only on high-value content.
This significantly reduces computation costs while expanding reasoning dataset domain coverage.

BrigitteTousi
posted
an
update
11 days ago
Post
2930
AI agents are transforming how we interact with technology, but how sustainable are they? 🌍
Design choices — like model size and structure — can massively impact energy use and cost. ⚡💰 The key takeaway: smaller, task-specific models can be far more efficient than large, general-purpose ones.
🔑 Open-source models offer greater transparency, allowing us to track energy consumption and make more informed decisions on deployment. 🌱 Open-source = more efficient, eco-friendly, and accountable AI.
Read our latest, led by @sasha with assists from myself + @yjernite 🤗
https://huggingface.co/blog/sasha/ai-agent-sustainability
Design choices — like model size and structure — can massively impact energy use and cost. ⚡💰 The key takeaway: smaller, task-specific models can be far more efficient than large, general-purpose ones.
🔑 Open-source models offer greater transparency, allowing us to track energy consumption and make more informed decisions on deployment. 🌱 Open-source = more efficient, eco-friendly, and accountable AI.
Read our latest, led by @sasha with assists from myself + @yjernite 🤗
https://huggingface.co/blog/sasha/ai-agent-sustainability
Post
2630
Llama 4 is in transformers!
Fun example using the instruction-tuned Maverick model responding about two images, using tensor parallel for maximum speed.
From https://huggingface.co/blog/llama4-release
Fun example using the instruction-tuned Maverick model responding about two images, using tensor parallel for maximum speed.
From https://huggingface.co/blog/llama4-release
Post
1934
Llama models (arguably the most successful open AI models of all times) just represented 3% of total model downloads on Hugging Face in March.
People and media like stories of winner takes all & one model/company to rule them all but the reality is much more nuanced than this!
Kudos to all the small AI builders out there!
People and media like stories of winner takes all & one model/company to rule them all but the reality is much more nuanced than this!
Kudos to all the small AI builders out there!
Post
3970
Before 2020, most of the AI field was open and collaborative. For me, that was the key factor that accelerated scientific progress and made the impossible possible—just look at the “T” in ChatGPT, which comes from the Transformer architecture openly shared by Google.
Then came the myth that AI was too dangerous to share, and companies started optimizing for short-term revenue. That led many major AI labs and researchers to stop sharing and collaborating.
With OAI and sama now saying they're willing to share open weights again, we have a real chance to return to a golden age of AI progress and democratization—powered by openness and collaboration, in the US and around the world.
This is incredibly exciting. Let’s go, open science and open-source AI!
Then came the myth that AI was too dangerous to share, and companies started optimizing for short-term revenue. That led many major AI labs and researchers to stop sharing and collaborating.
With OAI and sama now saying they're willing to share open weights again, we have a real chance to return to a golden age of AI progress and democratization—powered by openness and collaboration, in the US and around the world.
This is incredibly exciting. Let’s go, open science and open-source AI!
Post
2239
Very interesting security section by
@yjernite
@lvwerra
@reach-vb
@dvilasuero
& the team replicating R1. Broadly applicable to most open-source models & some to APIs (but APIs have a lot more additional risks because you're not in control of the underlying system):
https://huggingface.co/blog/open-r1/update-4#is-it-safe
https://huggingface.co/blog/open-r1/update-4#is-it-safe
Post
1572
A repository is created every ~15 secs on Hugging Face so
@kramp
added a "Getting Started" to make it easier & a model release checklist: https://huggingface.co/docs/hub/model-release-checklist
What are you uploading today?
What are you uploading today?
Post
2591
Nice new space to see how fast your personal or organization followers are growing on HF:
julien-c/follow-history
As you can see, I still have more followers than @julien-c even if he's trying to change this by building such cool spaces 😝😝😝
julien-c/follow-history
As you can see, I still have more followers than @julien-c even if he's trying to change this by building such cool spaces 😝😝😝

BrigitteTousi
posted
an
update
about 1 month ago
Post
3399
LeRobot goes to driving school! 🚗🚗🚗
Hugging Face just announced a new collab with Yaak to bring the largest open-source self-driving dataset to LeRobot!
Major kudos to HF's @cadene , as well as @sandhawalia , @Shnissen and the Yaak team!
Check out the blog post here: https://huggingface.co/blog/lerobot-goes-to-driving-school
Hugging Face just announced a new collab with Yaak to bring the largest open-source self-driving dataset to LeRobot!
Major kudos to HF's @cadene , as well as @sandhawalia , @Shnissen and the Yaak team!
Check out the blog post here: https://huggingface.co/blog/lerobot-goes-to-driving-school

BrigitteTousi
posted
an
update
about 1 month ago
Post
7300
I was chatting with
@peakji
, one of the cofounders of Manu AI, who told me he was on Hugging Face (very cool!).
He shared an interesting insight which is that agentic capabilities might be more of an alignment problem rather than a foundational capability issue. Similar to the difference between GPT-3 and InstructGPT, some open-source foundation models are simply trained to 'answer everything in one response regardless of the complexity of the question' - after all, that's the user preference in chatbot use cases. Just a bit of post-training on agentic trajectories can make an immediate and dramatic difference.
As a thank you to the community, he shared 100 invite code first-come first serve, just use “HUGGINGFACE” to get access!
He shared an interesting insight which is that agentic capabilities might be more of an alignment problem rather than a foundational capability issue. Similar to the difference between GPT-3 and InstructGPT, some open-source foundation models are simply trained to 'answer everything in one response regardless of the complexity of the question' - after all, that's the user preference in chatbot use cases. Just a bit of post-training on agentic trajectories can make an immediate and dramatic difference.
As a thank you to the community, he shared 100 invite code first-come first serve, just use “HUGGINGFACE” to get access!
Post
4687
10,000+ models based on Deepseek R1 have been publicly shared on Hugging Face! Which ones are your favorite ones: https://huggingface.co/models?sort=trending&search=r1. Truly game-changer!
Post
5925
Super happy to welcome Nvidia as our latest enterprise hub customer. They have almost 2,000 team members using Hugging Face, and close to 20,000 followers of their org. Can't wait to see what they'll open-source for all of us in the coming months!
Nvidia's org:
nvidia
Enterprise hub: https://huggingface.co/enterprise
Nvidia's org:

Enterprise hub: https://huggingface.co/enterprise

davanstrien
posted
an
update
about 2 months ago
Post
2924
📊 Introducing "Hugging Face Dataset Spotlight" 📊
I'm excited to share the first episode of our AI-generated podcast series focusing on nice datasets from the Hugging Face Hub!
This first episode explores mathematical reasoning datasets:
- SynthLabsAI/Big-Math-RL-Verified: Over 250,000 rigorously verified problems spanning multiple difficulty levels and mathematical domains
- open-r1/OpenR1-Math-220k: 220,000 math problems with multiple reasoning traces, verified for accuracy using Math Verify and Llama-3.3-70B models.
- facebook/natural_reasoning: 1.1 million general reasoning questions carefully deduplicated and decontaminated from existing benchmarks, showing superior scaling effects when training models like Llama3.1-8B-Instruct.
Plus a bonus segment on bespokelabs/bespoke-manim!
https://www.youtube.com/watch?v=-TgmRq45tW4
I'm excited to share the first episode of our AI-generated podcast series focusing on nice datasets from the Hugging Face Hub!
This first episode explores mathematical reasoning datasets:
- SynthLabsAI/Big-Math-RL-Verified: Over 250,000 rigorously verified problems spanning multiple difficulty levels and mathematical domains
- open-r1/OpenR1-Math-220k: 220,000 math problems with multiple reasoning traces, verified for accuracy using Math Verify and Llama-3.3-70B models.
- facebook/natural_reasoning: 1.1 million general reasoning questions carefully deduplicated and decontaminated from existing benchmarks, showing superior scaling effects when training models like Llama3.1-8B-Instruct.
Plus a bonus segment on bespokelabs/bespoke-manim!
https://www.youtube.com/watch?v=-TgmRq45tW4

davanstrien
posted
an
update
about 2 months ago
Post
3671
Quick POC: Turn a Hugging Face dataset card into a short podcast introducing the dataset using all open models.
I think I'm the only weirdo who would enjoy listening to something like this though 😅
Here is an example for eth-nlped/stepverify
I think I'm the only weirdo who would enjoy listening to something like this though 😅
Here is an example for eth-nlped/stepverify