llama.cpp is 26.8% faster than ollama. I have upgraded both, and using the same settings, I am running the same DeepSeek R1 Distill 1.5B on the same hardware. It's an Apples to Apples comparison.
We are reproducing the full DeepSeek R1 data and training pipeline so everybody can use their recipe. Instead of doing it in secret we can do it together in the open!
๐งช Step 1: replicate the R1-Distill models by distilling a high-quality reasoning corpus from DeepSeek-R1.
๐ง Step 2: replicate the pure RL pipeline that DeepSeek used to create R1-Zero. This will involve curating new, large-scale datasets for math, reasoning, and code.
๐ฅ Step 3: show we can go from base model -> SFT -> RL via multi-stage training.
We've rolled out a major update to ZeroGPU! All the Spaces are now running on it.
Major improvements:
1. GPU cold starts about twice as fast! 2. RAM usage reduced by two-thirds, allowing more effective resource usage, meaning more GPUs for the community! 3. ZeroGPU initializations (coldstarts) can now be tracked and displayed (use progress=gr.Progress(track_tqdm=True)) 4. Improved compatibility and PyTorch integration, increasing ZeroGPU compatible spaces without requiring any modifications!
Feel free to answer in the post if you have any questions
I am experimenting with Flux and trying to push it to its limits without training (as I am GPU-poor ๐ ). I found some flaws in the pipelines, which I resolved, and now I am able to generate an approx similar quality image as Flux Schnell 4 steps in just 1 step. Demo Link: KingNish/Realtime-FLUX