The new DeepSite space is really insane for vibe-coders enzostvs/deepsite
With the wave of vibe-coding-optimized LLMs like the latest open-source DeepSeek model (version V3-0324), you can basically prompt out-of-the-box and create any app and game in one-shot.
It feels so powerful to me, no more complex framework or under-the-hood prompt engineering to have a working text-to-app tool.
AI is eating the world and *open-source* AI is eating AI itself!
PS: and even more meta is that the DeepSite app and DeepSeek model are both fully open-source code => time to start recursively improve?
PPS: you still need some inference hosting unless you're running the 600B param model at home, so check the very nice list of HF Inference Providers for this model: deepseek-ai/DeepSeek-V3-0324
It's beating Claude 3.7 on (competitive) programming –a domain Anthropic has been historically really strong at– and it's getting close to o1-mini/R1 on olympiad level coding with just 7B parameters!
And the best part is that we're open-sourcing all about its training dataset, the new IOI benchmark, and more in our Open-R1 progress report #3: https://huggingface.co/blog/open-r1/update-3
Gemma3 family is out! Reading the tech report, and this section was really interesting to me from a methods/scientific fairness pov.
Instead of doing over-hyped comparisons, they clearly state that **results are reported in a setup which is advantageous to their models**. (Which everybody does, but people usually don't say)
For a tech report, it makes a lot of sense to report model performance when used optimally! On leaderboards on the other hand, comparison will be apples to apples, but in a potentially unoptimal way for a given model family (like some user interact sub-optimally with models)
Also contains a cool section (6) on training data memorization rate too! Important to see if your model will output the training data it has seen as such: always an issue for privacy/copyright/... but also very much for evaluation!
Because if your model knows its evals by heart, you're not testing for generalization.
I’m excited to introduce a new leaderboard UI + keyboard shortcuts on the TTS Arena!
The refreshed UI for the leaderboard is smoother and (hopefully) more intuitive. You can now view models based on a simpler win-rate percentage and exclude closed models.
In addition, the TTS Arena now supports keyboard shortcuts. This should make voting much more efficient as you can now vote without clicking anything!
In both the normal Arena and Battle Mode, press "r" to select a random text, Cmd/Ctrl + Enter to synthesize, and "a"/"b" to vote! View more details about keyboard shortcuts by pressing "?" (Shift + /) on the Arena.
We applied the same data-driven approach that led to SOTA English performance in🍷 FineWeb to thousands of languages.
🥂 FineWeb2 has 8TB of compressed text data and outperforms other multilingual datasets in our experiments.
The dataset is released under the permissive 📜 ODC-By 1.0 license, and the 💻 code to reproduce it and our evaluations is public.
We will very soon announce a big community project, and are working on a 📝 blogpost walking you through the entire dataset creation process. Stay tuned!