Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up

All HF Hub posts

Kseniase 
posted an update 1 day ago
view post
Post
2947
8 Emerging trends in Reinforcement Learning

Reinforcement learning is having a moment - and not just this week. Some of its directions are already showing huge promise, while others are still early but exciting. Here’s a look at what’s happening right now in RL:

1. Reinforcement Pre-Training (RPT) → Reinforcement Pre-Training (2506.08007)
Reframes next-token pretraining as RL with verifiable rewards, yielding scalable reasoning gains

2. Reinforcement Learning from Human Feedback (RLHF) → Deep reinforcement learning from human preferences (1706.03741)
The top approach. It trains a model using human preference feedback, building a reward model and then optimizing the policy to generate outputs people prefer

3. Reinforcement Learning with Verifiable Rewards (RLVR) → Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs (2506.14245)
Moves from subjective (human-labeled) rewards to objective ones that can be automatically verified, like in math, code, or rubrics as reward, for example → Reinforcement Learning with Rubric Anchors (2508.12790), Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains (2507.17746)

4. Multi-objective RL → Pareto Multi-Objective Alignment for Language Models (2508.07768)
Trains LMs to balance multiple goals at once, like being helpful but also concise or creative, ensuring that improving one goal doesn’t ruin another

5. Parallel thinking RL → Parallel-R1: Towards Parallel Thinking via Reinforcement Learning (2509.07980)
Trains parallel chains of thought, boosting math accuracy and final ceilings. It first teaches the model “parallel thinking” skill on easier problems, then uses RL to refine it on harder ones

Read further below ⬇️
And if you like this, subscribe to the Turing post: https://www.turingpost.com/subscribe

Also, check out our recent guide about the past, present and future of RL: https://www.turingpost.com/p/rlguide
·
Molbap 
posted an update about 15 hours ago
view post
Post
968
🚀 New blog: Maintain the unmaintainable – 1M+ Python LOC, 400+ models

How do you stop a million-line library built by thousands of contributors from collapsing under its own weight?
At 🤗 Transformers, we do it with explicit software-engineering tenets, principles that make the codebase hackable at scale.

🔍 Inside the post:
– One Model, One File: readability first — you can still open a modeling file and see the full logic, top to bottom.
– Modular Transformers: visible inheritance that cuts maintenance cost by ~15× while keeping models readable.
– Config-Driven Performance: FlashAttention, tensor parallelism, and attention scheduling are config-level features, not rewrites.

Written with @lysandre ,@pcuenq and @yonigozlan , this is a deep dive into how Transformers stays fast, open, and maintainable.

Read it here → transformers-community/Transformers-tenets
SelmaNajih001 
posted an update about 20 hours ago
view post
Post
858
Finally, I uploaded the model I developed for my master’s thesis! Given a financial event, it provides explained predictions based on a dataset of past news and central bank speeches.
Try it out here:
SelmaNajih001/StockPredictionExplanation
(Just restart the space and wait a minute)

The dataset used for RAG can be found here:
SelmaNajih001/FinancialNewsAndCentralBanksSpeeches-Summary-Rag
While the dataset used for the training is:
SelmaNajih001/FinancialClassification

I also wrote an article to explain how I've done the training. You can find it here https://huggingface.co/blog/SelmaNajih001/explainable-financial-predictions

sergiopaniego 
posted an update about 22 hours ago
view post
Post
1693
A few days ago, Thinking Machines Lab released “LoRA Without Regret”, showing that LoRA can match full fine-tuning performance when configured right.

Naturally, we decided to reproduce the results with TRL and release a guide!

https://huggingface.co/docs/trl/main/en/lora_without_regret
lunarflu 
posted an update about 20 hours ago
view post
Post
834
Cool stuff these past weeks on huggingface! 🤗 🚀 !
• 📈Trackio, local-first W&B alternative
https://github.com/gradio-app/trackio/issues
• 🌍EmbeddingGemma, 300M-param, multilingual embeddings, on-device
https://huggingface.co/blog/embeddinggemma
• 💻Open LLMs in VS Code (Inference Providers)
https://x.com/reach_vb/status/1966185427582497171
• 🤖Smol2Operator GUI agents
https://huggingface.co/blog/smol2operator
• 🖼️Gradio visible watermarking
https://huggingface.co/blog/watermarking-with-gradio
Parveshiiii 
posted an update 3 days ago
view post
Post
4249
Ever wanted an open‑source deep research agent? Meet Deepresearch‑Agent 🔍🤖

1. Multi‑step reasoning: Reflects between steps, fills gaps, iterates until evidence is solid.

2. Research‑augmented: Generates queries, searches, synthesizes, and cites sources.

3. Fullstack + LLM‑friendly: React/Tailwind frontend, LangGraph/FastAPI backend; works with OpenAI/Gemini.


🔗 GitHub: https://github.com/Parveshiiii/Deepresearch-Agent
MonsterMMORPG 
posted an update 3 days ago
view post
Post
3052
Ovi - Generate Videos With Audio Like VEO 3 or SORA 2 - Run Locally - Open Source for Free

Download and install : https://www.patreon.com/posts/140393220

Quick demo tutorial : https://youtu.be/uE0QabiHmRw

Ovi: Twin Backbone Cross-Modal Fusion for Audio-Video Generation

Project page : https://aaxwaz.github.io/Ovi/

SECourses Ovi Pro Premium App Features

Full scale ultra advanced app for Ovi - an open source project that can generate videos from both text prompts and image + text prompts with real audio.

Project page is here : https://aaxwaz.github.io/Ovi/

I have developed an ultra advanced Gradio app and much better pipeline that fully supports block swapping

Now we can generate full quality videos with as low as 8.2 GB VRAM

Hopefully I will work on dynamic on load FP8_Scaled tomorrow to improve VRAM even further

So more VRAM optimizations will come hopefully tomorrow

Our implemented block swapping is the very best one out there - I took the approach from famous Kohya Musubi tuner

The 1-click installer will install into Python 3.10.11 venv and will auto download models as well so it is literally 1-click

My installer auto installs with Torch 2.8, CUDA 12.9, Flash Attention 2.8.3 and it supports literally all GPUs like RTX 3000 series, 4000 series, 5000 series, H100, B200, etc

All generations will be saved inside outputs folder and we support so many features like batch folder processing, number of generations, full preset save and load

This is a rush release (in less than a day) so there can be errors please let me know and I will hopefully improve the app

Look the examples to understand how to prompt the model that is extremely important


RTX 5090 can run it without any block swap with just cpu-offloading - really fast
  • 2 replies
·
evijit 
posted an update about 13 hours ago
view post
Post
715
AI for Scientific Discovery Won't Work Without Fixing How We Collaborate.

My co-author @cgeorgiaw and I just published a paper challenging a core assumption: that the main barriers to AI in science are technical. They're not. They're social.

Key findings:

🚨 The "AI Scientist" myth delays progress: Waiting for AGI devalues human expertise and obscures science's real purpose: cultivating understanding, not just outputs.
📊 Wrong incentives: Datasets have 100x longer impact than models, yet data curation is undervalued.
⚠️ Broken collaboration: Domain scientists want understanding. ML researchers optimize performance. Without shared language, projects fail.
🔍 Fragmentation costs years: Harmonizing just 9 cancer files took 329 hours.

Why this matters: Upstream bottlenecks like efficient PDE solvers could accelerate discovery across multiple sciences. CASP mobilized a community around protein structure, enabling AlphaFold. We need this for dozens of challenges.

Thus, we're launching Hugging Science! A global community addressing these barriers through collaborative challenges, open toolkits, education, and community-owned infrastructure. Please find all the links below!

Paper: AI for Scientific Discovery is a Social Problem (2509.06580)
Join: hugging-science
Discord: https://discord.com/invite/VYkdEVjJ5J
404Zen 
posted an update 3 days ago
view post
Post
3855
Sora 2 invitation code, share your Sora 2 invitation code!

Also, you can use the imini AI platform directly: https://imini.com/

  • 1 reply
·
codelion 
posted an update about 14 hours ago
view post
Post
300
🚀 Adaptive Classifier v0.1.0: Now with ONNX Runtime Support!

We're excited to announce a major update to Adaptive Classifier - a flexible, continuous learning classification system that adapts to new classes without retraining!

What's New:

⚡ ONNX Runtime Integration: Get 1.14x faster CPU inference out of the box (up to 4x on x86 processors)

📦 INT8 Quantization: Models are now 4x smaller with minimal accuracy loss, making deployment easier and faster

🎯 Smart Loading: Automatically uses the best model variant for your hardware - quantized for speed by default, or unquantized for maximum accuracy

🔄 7.5x Faster Model Loading: Get started quickly with optimized model initialization

How It Works:

Adaptive Classifier lets you build text classifiers that continuously learn from new examples without catastrophic forgetting. Perfect for:
- Dynamic classification tasks where classes evolve over time
- Few-shot learning scenarios with limited training data
- Production systems that need to adapt to new categories

The new ONNX support means you get production-ready speed on CPU without any code changes - just load and run!

Try it now:

from adaptive_classifier import AdaptiveClassifier

# Load with ONNX automatically enabled (quantized for best performance)
classifier = AdaptiveClassifier.load("adaptive-classifier/llm-router")

# Add examples dynamically
classifier.add_examples(
["Route this to GPT-4", "Simple task for GPT-3.5"],
["strong", "weak"]
)

# Predict with optimized inference
predictions = classifier.predict("Complex reasoning task")

Check out our LLM Router model to see it in action:
adaptive-classifier/llm-router

GitHub Repository:
https://github.com/codelion/adaptive-classifier

Install now: pip install adaptive-classifier

We'd love to hear your feedback and see what you build with it!

#MachineLearning #NLP #ONNX #ContinuousLearning #TextClassification