Today in Privacy & AI Tooling - introducing a nifty new tool to examine where data goes in open-source apps on ๐ค
HF Spaces have tons (100Ks!) of cool demos leveraging or examining AI systems - and because most of them are OSS we can see exactly how they handle user data ๐๐
That requires actually reading the code though, which isn't always easy or quick! Good news: code LMs have gotten pretty good at automatic review, so we can offload some of the work - here I'm using Qwen/Qwen2.5-Coder-32B-Instruct to generate reports and it works pretty OK ๐
The app works in three stages: 1. Download all code files 2. Use the Code LM to generate a detailed report pointing to code where data is transferred/(AI-)processed (screen 1) 3. Summarize the app's main functionality and data journeys (screen 2) 4. Build a Privacy TLDR with those inputs
It comes with a bunch of pre-reviewed apps/Spaces, great to see how many process data locally or through (private) HF endpoints ๐ค
I read the 456-page AI Index report so you don't have to (kidding). The wild part? While AI gets ridiculously more accessible, the power gap is actually widening:
1๏ธโฃ The democratization of AI capabilities is accelerating rapidly: - The gap between open and closed models is basically closed: difference in benchmarks like MMLU and HumanEval shrunk to just 1.7% in 2024 - The cost to run GPT-3.5-level performance dropped 280x in 2 years - Model size is shrinking while maintaining performance - Phi-3-mini hitting 60%+ MMLU at fraction of parameters of early models like PaLM
2๏ธโฃ But we're seeing concerning divides deepening: - Geographic: US private investment ($109B) dwarfs everyone else - 12x China's $9.3B - Research concentration: US and China dominate highly-cited papers (50 and 34 respectively in 2023), while next closest is only 7 - Gender: Major gaps in AI skill penetration rates - US shows 2.39 vs 1.71 male/female ratio
The tech is getting more accessible but the benefits aren't being distributed evenly. Worth thinking about as these tools become more central to the economy.
AI agents are transforming how we interact with technology, but how sustainable are they? ๐
Design choices โ like model size and structure โ can massively impact energy use and cost. โก๐ฐ The key takeaway: smaller, task-specific models can be far more efficient than large, general-purpose ones.
๐ Open-source models offer greater transparency, allowing us to track energy consumption and make more informed decisions on deployment. ๐ฑ Open-source = more efficient, eco-friendly, and accountable AI.
Huge week for xet-team as Llama 4 is the first major model on Hugging Face uploaded with Xet providing the backing! Every byte downloaded comes through our infrastructure.
Using Xet on Hugging Face is the fastest way to download and iterate on open source models and we've proved it with Llama 4 giving a boost of ~25% across all models.
We expect builders on the Hub to see even more improvements, helping power innovation across the community.
With the models on our infrastructure, we can peer in and see how well our dedupe performs across the Llama 4 family. On average, we're seeing ~25% dedupe, providing huge savings to the community who iterate on these state-of-the-art models. The attached image shows a few selected models and how they perform on Xet.
Thanks to the meta-llama team for launching on Xet!
We've all become experts at clicking "I agree" without a second thought. In my latest blog post, I explore why these traditional consent models are increasingly problematic in the age of generative AI.
I found three fundamental challenges: - Scope problem: how can you know what you're agreeing to when AI could use your data in different ways? - Temporality problem: once an AI system learns from your data, good luck trying to make it "unlearn" it. - Autonomy trap: the data you share today could create systems that pigeonhole you tomorrow.
Individual users shouldn't bear all the responsibility, while big tech holds all the cards. We need better approaches to level the playing field, from collective advocacy and stronger technological safeguards to establishing "data fiduciaries" with a legal duty to protect our digital interests.
We've all become experts at clicking "I agree" without a second thought. In my latest blog post, I explore why these traditional consent models are increasingly problematic in the age of generative AI.
I found three fundamental challenges: - Scope problem: how can you know what you're agreeing to when AI could use your data in different ways? - Temporality problem: once an AI system learns from your data, good luck trying to make it "unlearn" it. - Autonomy trap: the data you share today could create systems that pigeonhole you tomorrow.
Individual users shouldn't bear all the responsibility, while big tech holds all the cards. We need better approaches to level the playing field, from collective advocacy and stronger technological safeguards to establishing "data fiduciaries" with a legal duty to protect our digital interests.
โ Hosting our own inference was not enough: now the Hub 4 new inference providers: fal, Replicate, SambaNova Systems, & Together AI.
Check model cards on the Hub: you can now, in 1 click, use inference from various providers (cf video demo)
Their inference can also be used through our Inference API client. There, you can use either your custom provider key, or your HF token, then billing will be handled directly on your HF account, as a way to centralize all expenses.
๐ธ Also, PRO users get 2$ inference credits per month!
Exciting breakthrough in Retrieval-Augmented Generation (RAG): Introducing MiniRAG - a revolutionary approach that makes RAG systems accessible for edge devices and resource-constrained environments.
Key innovations that set MiniRAG apart:
Semantic-aware Heterogeneous Graph Indexing - Combines text chunks and named entities in a unified structure - Reduces reliance on complex semantic understanding - Creates rich semantic networks for precise information retrieval
Lightweight Topology-Enhanced Retrieval - Leverages graph structures for efficient knowledge discovery - Uses pattern matching and localized text processing - Implements query-guided reasoning path discovery
Impressive Performance Metrics - Achieves comparable results to LLM-based methods while using Small Language Models (SLMs) - Requires only 25% of storage space compared to existing solutions - Maintains robust performance with accuracy reduction ranging from just 0.8% to 20%
The researchers from Hong Kong University have also contributed a comprehensive benchmark dataset specifically designed for evaluating lightweight RAG systems under realistic on-device scenarios.
This breakthrough opens new possibilities for: - Edge device AI applications - Privacy-sensitive implementations - Real-time processing systems - Resource-constrained environments
The full implementation and datasets are available on GitHub: HKUDS/MiniRAG
1 reply
ยท
reacted to fdaudens's
post with โค๏ธ3 months ago
Reminder: Donโt. Use. ChatGPT. As. A. Calculator. Seriously. ๐ค
Loved listening to @sasha on Hard Forkโit really made me think.
A few takeaways that hit home: - Individual culpability only gets you so far. The real priority: demanding accountability and transparency from companies. - Evaluate if generative AI is the right tool for certain tasks (like search) before using it.
๐ซ...And we're live!๐ซ Seasonal newsletter from ethicsy folks at Hugging Face, exploring the ethics of "AI Agents" https://huggingface.co/blog/ethics-soc-7 Our analyses found: - There's a spectrum of "agent"-ness - *Safety* is a key issue, leading to many other value-based concerns Read for details & what to do next! With @evijit , @giadap , and @sasha
๐ค๐ค ๐ป Speaking of AI agents ... ...Is easier with the right words ;)
My colleagues @meg@evijit@sasha and @giadap just published a wonderful blog post outlining some of the main relevant notions with their signature blend of value-informed and risk-benefits contrasting approach. Go have a read!
The paper has a lot of experiments (they trained 84 models!) about what makes the video LMs work โฏ๏ธ
Try the demo for best setup here https://huggingface.co/spaces/Apollo-LMMs/Apollo-3B they evaluate sampling strategies, scaling laws for models and datasets, video representation and more! > The authors find out that whatever design decision was applied to small models also scale properly when the model and dataset are scaled ๐ scaling dataset has diminishing returns for smaller models > They evaluate frame sampling strategies, and find that FPS sampling is better than uniform sampling, and they find 8-32 tokens per frame optimal > They also compare image encoders, they try a variation of models from shape optimized SigLIP to DINOv2 they find google/siglip-so400m-patch14-384 to be most powerful ๐ฅ > they also compare freezing different parts of models, training all stages with some frozen parts give the best yield
They eventually release three models, where Apollo-3B outperforms most 7B models and Apollo 7B outperforms 30B models ๐ฅ
Did a fun experiment: What are the main themes emerging from the 100+ Nieman Journalism Lab predictions for 2025?
I used natural language processing to cluster and map them โ really helps spot patterns that weren't obvious when reading predictions one by one. So what will shape journalism next year? A lot of AI and US politics (surprise!), but there's also this horizontal axis that spans from industry strategies to deep reflections on how to talk to the public.
Click any dot to explore the original prediction. What themes surprise/interest you the most?
๐ช๐บ Policy Thoughts in the EU AI Act Implementation ๐ช๐บ
There is a lot to like in the first draft of the EU GPAI Code of Practice, especially as regards transparency requirements. The Systemic Risks part, on the other hand, is concerning for both smaller developers and for external stakeholders.
I wrote more on this topic ahead of the next draft. TLDR: more attention to immediate large-scale risks and to collaborative solutions supported by evidence can help everyone - as long as developers disclose sufficient information about their design choices and deployment contexts.