Florent Daudens's picture

Florent Daudens

fdaudens

AI & ML interests

AI & Journalism

Recent Activity

Organizations

Hugging Face's profile picture Hugging Face OSS Metrics's profile picture Hugging Face Smol Models Research's profile picture ZeroGPU Explorers's profile picture LeRobot's profile picture Journalists on Hugging Face's profile picture Major TOM's profile picture MLX Community's profile picture Social Post Explorers's profile picture Projet Spinoza's profile picture Dev Mode Explorers's profile picture Hugging Face for Legal's profile picture Hugging Face Discord Community's profile picture Big Science Social Impact Evaluation for Bias and Stereotypes's profile picture Dataset Tools's profile picture Hugging Face Science's profile picture Coordination Nationale pour l'IA's profile picture Data Is Better Together Contributor's profile picture Sandbox's profile picture Open R1's profile picture

fdaudens's activity

posted an update about 3 hours ago
view post
Post
104
Did we just drop personalized AI evaluation?! This tool auto-generates custom benchmarks on your docs to test which models are the best.

Most benchmarks test general capabilities, but what matters is how models handle your data and tasks. YourBench helps answer critical questions like:
- Do you really need a hundreds-of-billions-parameter model sledgehammer to crack a nut?
- Could a smaller, fine-tuned model work better?
- How well do different models understand your domain?

Some cool features:
šŸ“š Generates custom benchmarks from your own documents (PDFs, Word, HTML)
šŸŽÆ Tests models on real tasks, not just general capabilities
šŸ”„ Supports multiple models for different pipeline stages
šŸ§  Generate both single-hop and multi-hop questions
šŸ” Evaluate top models and deploy leaderboards instantly
šŸ’° Full cost analysis to optimize for your budget
šŸ› ļø Fully configurable via a single YAML file

26 SOTA models tested for question generation. Interesting finding: Qwen2.5 32B leads in question diversity, while smaller Qwen models and Gemini 2.0 Flash offer great value for cost.

You can also run it locally on any models you want.

I'm impressed. Try it out: yourbench/demo
posted an update 2 days ago
view post
Post
1559
šŸ”„ DeepSeek vibe coding with DeepSite is going viral with awesome projects!

From games to stunning visualizations, 7 wild examples:

šŸ“ŗ AI TV with custom channels and animations https://x.com/_akhaliq/status/1905747381951545647

šŸš€ Earth to Moon spacecraft journey visualization
Watch this incredible Three.js space simulation with zero external assets:
https://x.com/_akhaliq/status/1905836902533451999

šŸ’£ Minesweeper in 2.5 minutes! Built & deployed instantly on DeepSite. Zero setup needed:
https://x.com/cholf5/status/1906031928937218334

šŸŽ® Asked for Game of Life, got a masterpiece. Simple prompt, complex features. See it in action: https://x.com/pbeyssac/status/1906304454824992844

šŸ’« One-shot anime website with perfect UI. DeepSite turned a simple request into a fully-functional anime site: https://x.com/risphereeditor/status/1905961725028913264

šŸ“Š 10-minute World Indicators Dashboard. Just described what I wanted and got a full interactive dashboard! https://x.com/i/status/1906345214089785634

āœØ Ready to build without coding? Imagine it. Build it. Share it! enzostvs/deepsite
posted an update 3 days ago
view post
Post
2001
Want to vibecode with DeepSeek? Just spent 10 minutes with this space and created a full world indicators dashboard - literally just by describing what I wanted!

Anyone can now prototype and deploy projects instantly.

Try out the app: enzostvs/deepsite

My dashboard: fdaudens/world-indicators
published a Space 3 days ago