AI & ML interests
None defined yet.
Recent Activity
View all activity
fr-gouv-coordination-ia's activity

BrigitteTousi
posted
an
update
about 17 hours ago
Post
4080
Honored to be named among their 12 pioneers and power players in the news industry in the 2025 Tech Trends Report from Future Today Strategy Group.
Incredible group to be part of - each person is doing groundbreaking work at the intersection of AI and journalism. Worth following them all: they're consistently sharing practical insights on building the future of news.
Take the time to read this report, it's packed with insights as always. The news & information section's #1 insight hits hard: "The most substantive economic impact of AI to date has been licensing payouts for a handful of big publishers. The competition will start shifting in the year ahead to separate AI 'haves' that have positioned themselves to grow from the 'have-nots.'"
This AI-driven divide is something I've been really concerned about. Now is the time to build more than ever!
👉 Full report here: https://ftsg.com/wp-content/uploads/2025/03/FTSG_2025_TR_FINAL_LINKED.pdf
Incredible group to be part of - each person is doing groundbreaking work at the intersection of AI and journalism. Worth following them all: they're consistently sharing practical insights on building the future of news.
Take the time to read this report, it's packed with insights as always. The news & information section's #1 insight hits hard: "The most substantive economic impact of AI to date has been licensing payouts for a handful of big publishers. The competition will start shifting in the year ahead to separate AI 'haves' that have positioned themselves to grow from the 'have-nots.'"
This AI-driven divide is something I've been really concerned about. Now is the time to build more than ever!
👉 Full report here: https://ftsg.com/wp-content/uploads/2025/03/FTSG_2025_TR_FINAL_LINKED.pdf
Post
3974
AI will bring us "a country of yes-men on servers" instead of one of "Einsteins sitting in a data center" if we continue on current trends.
Must-read by @thomwolf deflating overblown AI promises and explaining what real scientific breakthroughs require.
https://thomwolf.io/blog/scientific-ai.html
Must-read by @thomwolf deflating overblown AI promises and explaining what real scientific breakthroughs require.
https://thomwolf.io/blog/scientific-ai.html
Post
3412
What if AI becomes as ubiquitous as the internet, but runs locally and transparently on our devices?
Fascinating TED talk by @thomwolf on open source AI and its future impact.
Imagine this for AI: instead of black box models running in distant data centers, we get transparent AI that runs locally on our phones and laptops, often without needing internet access. If the original team moves on? No problem - resilience is one of the beauties of open source. Anyone (companies, collectives, or individuals) can adapt and fix these models.
This is a compelling vision of AI's future that solves many of today's concerns around AI transparency and centralized control.
Watch the full talk here: https://www.ted.com/talks/thomas_wolf_what_if_ai_just_works
Fascinating TED talk by @thomwolf on open source AI and its future impact.
Imagine this for AI: instead of black box models running in distant data centers, we get transparent AI that runs locally on our phones and laptops, often without needing internet access. If the original team moves on? No problem - resilience is one of the beauties of open source. Anyone (companies, collectives, or individuals) can adapt and fix these models.
This is a compelling vision of AI's future that solves many of today's concerns around AI transparency and centralized control.
Watch the full talk here: https://www.ted.com/talks/thomas_wolf_what_if_ai_just_works
Post
3072
Is this the best tool to extract clean info from PDFs, handwriting and complex documents yet?
Open source olmOCR just dropped and the results are impressive.
Tested the free demo with various documents, including a handwritten Claes Oldenburg letter. The speed is impressive: 3000 tokens/second on your own GPU - that's 1/32 the cost of GPT-4o ($190/million pages). Game-changer for content extraction and digital archives.
To achieve this, Ai2 trained a 7B vision language model on 260K pages from 100K PDFs using "document anchoring" - combining PDF metadata with page images.
Best part: it actually understands document structure (columns, tables, equations) instead of just jumbling everything together like most OCR tools. Their human eval results back this up.
👉 Try the demo: https://olmocr.allenai.org
Going right into the AI toolkit: JournalistsonHF/ai-toolkit
Open source olmOCR just dropped and the results are impressive.
Tested the free demo with various documents, including a handwritten Claes Oldenburg letter. The speed is impressive: 3000 tokens/second on your own GPU - that's 1/32 the cost of GPT-4o ($190/million pages). Game-changer for content extraction and digital archives.
To achieve this, Ai2 trained a 7B vision language model on 260K pages from 100K PDFs using "document anchoring" - combining PDF metadata with page images.
Best part: it actually understands document structure (columns, tables, equations) instead of just jumbling everything together like most OCR tools. Their human eval results back this up.
👉 Try the demo: https://olmocr.allenai.org
Going right into the AI toolkit: JournalistsonHF/ai-toolkit
Post
3273
🚀 Just launched: A toolkit of 20 powerful AI tools that journalists can use right now - transcribe, analyze, create. 100% free & open-source.
Been testing all these tools myself and created a searchable collection of the most practical ones - from audio transcription to image generation to document analysis. No coding needed, no expensive subscriptions.
Some highlights I've tested personally:
- Private, on-device transcription with speaker ID in 100+ languages using Whisper
- Website scraping that just works - paste a URL, get structured data
- Local image editing with tools like Finegrain (impressive results)
- Document chat using Qwen 2.5 72B (handles technical papers well)
Sharing this early because the best tools come from the community. Drop your favorite tools in the comments or join the discussion on what to add next!
👉 JournalistsonHF/ai-toolkit
Been testing all these tools myself and created a searchable collection of the most practical ones - from audio transcription to image generation to document analysis. No coding needed, no expensive subscriptions.
Some highlights I've tested personally:
- Private, on-device transcription with speaker ID in 100+ languages using Whisper
- Website scraping that just works - paste a URL, get structured data
- Local image editing with tools like Finegrain (impressive results)
- Document chat using Qwen 2.5 72B (handles technical papers well)
Sharing this early because the best tools come from the community. Drop your favorite tools in the comments or join the discussion on what to add next!
👉 JournalistsonHF/ai-toolkit
Post
3506
Trying something new to keep you ahead of the curve: The 5 AI stories of the week - a weekly curation of the most important AI news you need to know. Do you like it?
For more AI stories and deeper analysis, check out my newsletter: https://open.substack.com/pub/fdaudens/p/ai-competition-heats-up-grok-3-iphone
For more AI stories and deeper analysis, check out my newsletter: https://open.substack.com/pub/fdaudens/p/ai-competition-heats-up-grok-3-iphone
Post
5803
🎯 Perplexity drops their FIRST open-weight model on Hugging Face: A decensored DeepSeek-R1 with full reasoning capabilities. Tested on 1000+ examples for unbiased responses.
Check it out: perplexity-ai/r1-1776
Blog post: https://perplexity.ai/hub/blog/open-sourcing-r1-1776
Check it out: perplexity-ai/r1-1776
Blog post: https://perplexity.ai/hub/blog/open-sourcing-r1-1776
Post
2281
Will we soon all have our own personalized AI news agents? And what does it mean for journalism?
Just built a simple prototype based on the Hugging Face course. It lets you get customized news updates on any topic.
Not perfect yet, but you can see where things could go: we'll all be able to build personalized AI agents that curate & analyze news for each of us. And users who could decide to build custom news products for their needs, such as truly personalized newsletters or podcasts.
The implications for both readers & news organizations are significant. To name a few:
- Will news articles remain the best format for informing people?
- What monetization model will work for news organizations?
- How do you create an effective conversion funnel?
👉 Try it here: fdaudens/my-news-agent (Code is open-source)
👉 Check out the course: https://huggingface.co/learn/agents-course/unit0/introduction
Just built a simple prototype based on the Hugging Face course. It lets you get customized news updates on any topic.
Not perfect yet, but you can see where things could go: we'll all be able to build personalized AI agents that curate & analyze news for each of us. And users who could decide to build custom news products for their needs, such as truly personalized newsletters or podcasts.
The implications for both readers & news organizations are significant. To name a few:
- Will news articles remain the best format for informing people?
- What monetization model will work for news organizations?
- How do you create an effective conversion funnel?
👉 Try it here: fdaudens/my-news-agent (Code is open-source)
👉 Check out the course: https://huggingface.co/learn/agents-course/unit0/introduction
Post
2127
🔊 Meet Kokoro Web - Free, ML speech synthesis on your computer, that'll make you ditch paid services!
28 natural voices, unlimited generations, and WebGPU acceleration. Perfect for journalists and content creators.
Test it with full articles—sounds amazingly human! 🎯🎙️
Xenova/kokoro-web
28 natural voices, unlimited generations, and WebGPU acceleration. Perfect for journalists and content creators.
Test it with full articles—sounds amazingly human! 🎯🎙️
Xenova/kokoro-web
Post
2689
⭐️ The AI Energy Score project just launched - this is a game-changer for making informed decisions about AI deployment.
You can now see exactly how much energy your chosen model will consume, with a simple 5-star rating system. Think appliance energy labels, but for AI.
Looking at transcription models on the leaderboard is fascinating: choosing between whisper-tiny or whisper-large-v3 can make a 7x difference. Real-time data on these tradeoffs changes everything.
166 models already evaluated across 10 different tasks, from text generation to image classification. The whole thing is public and you can submit your own models to test.
Why this matters:
- Teams can pick efficient models that still get the job done
- Developers can optimize for energy use from day one
- Organizations can finally predict their AI environmental impact
If you're building with AI at any scale, definitely worth checking out.
👉 leaderboard: https://lnkd.in/esrSxetj
👉 blog post: https://lnkd.in/eFJvzHi8
Huge work led by @sasha with @bgamazay @yjernite @sarahooker @regisss @meg
You can now see exactly how much energy your chosen model will consume, with a simple 5-star rating system. Think appliance energy labels, but for AI.
Looking at transcription models on the leaderboard is fascinating: choosing between whisper-tiny or whisper-large-v3 can make a 7x difference. Real-time data on these tradeoffs changes everything.
166 models already evaluated across 10 different tasks, from text generation to image classification. The whole thing is public and you can submit your own models to test.
Why this matters:
- Teams can pick efficient models that still get the job done
- Developers can optimize for energy use from day one
- Organizations can finally predict their AI environmental impact
If you're building with AI at any scale, definitely worth checking out.
👉 leaderboard: https://lnkd.in/esrSxetj
👉 blog post: https://lnkd.in/eFJvzHi8
Huge work led by @sasha with @bgamazay @yjernite @sarahooker @regisss @meg
Post
1298
🔥 Video AI is taking over! Out of 17 papers dropped on Hugging Face today, 6 are video-focused - from Sliding Tile Attention to On-device Sora. The race for next-gen video tech is heating up! 🎬🚀
Post
2103
📢 SmolLM2 paper released! Learn how the 🤗 team built one of the best small language models: from data choices to training insights. Check out our findings and share your thoughts! 🤏💡
Check it out: SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model (2502.02737)
Check it out: SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model (2502.02737)

clefourrier
authored
a
paper
about 1 month ago
Post
2392
📊 R1 just built its own download dashboard!
Some fresh stats: +6M downloads for 800+ derivative models vs 2M for originals. Watch the numbers grow here: fdaudens/deepseek-download-stats
Some fresh stats: +6M downloads for 800+ derivative models vs 2M for originals. Watch the numbers grow here: fdaudens/deepseek-download-stats
Post
3393
🎯 Kokoro TTS just hit v1.0! 🚀
Small but mighty: 82M parameters, runs locally, speaks multiple languages. The best part? It's Apache 2.0 licensed!
This could unlock so many possibilities ✨
Check it out: hexgrad/Kokoro-82M
Small but mighty: 82M parameters, runs locally, speaks multiple languages. The best part? It's Apache 2.0 licensed!
This could unlock so many possibilities ✨
Check it out: hexgrad/Kokoro-82M
Post
8760
Yes, DeepSeek R1's release is impressive. But the real story is what happened in just 7 days after:
- Original release: 8 models, 540K downloads. Just the beginning...
- The community turned those open-weight models into +550 NEW models on Hugging Face. Total downloads? 2.5M—nearly 5X the originals.
The reason? DeepSeek models are open-weight, letting anyone build on top of them. Interesting to note that the community focused on quantized versions for better efficiency & accessibility. They want models that use less memory, run faster, and are more energy-efficient.
When you empower builders, innovation explodes. For everyone. 🚀
The most popular community model? @bartowski 's DeepSeek-R1-Distill-Qwen-32B-GGUF version — 1M downloads alone.
- Original release: 8 models, 540K downloads. Just the beginning...
- The community turned those open-weight models into +550 NEW models on Hugging Face. Total downloads? 2.5M—nearly 5X the originals.
The reason? DeepSeek models are open-weight, letting anyone build on top of them. Interesting to note that the community focused on quantized versions for better efficiency & accessibility. They want models that use less memory, run faster, and are more energy-efficient.
When you empower builders, innovation explodes. For everyone. 🚀
The most popular community model? @bartowski 's DeepSeek-R1-Distill-Qwen-32B-GGUF version — 1M downloads alone.
Post
1761
What's at stake with Meta's decision to change its content moderation policy?
@giadap
has, by far, the most thoughtful take I’ve seen on this question. Read her op-ed: https://www.techpolicy.press/when-freedom-bites-back-meta-moderation-and-the-limits-of-tolerance/