That's super cool, congrats! :)
Maxime Labonne PRO
mlabonne
AI & ML interests
Post-training, model editing, quantization
Recent Activity
new activity
3 days ago
mlabonne/gemma-3-27b-it-abliterated-v2:Re-released?
Organizations

reacted to
drwlf's
post with ❤️🤗
24 days ago
Post
5411
Having an insanely good medical LLM is pointless if it won’t answer your questions!
So we’ve made 2 notebook for abliterating any model in order to achieve a good model that will actually help you!
The notebooks are made using @mlabonne ‘s abliteration logic and datasets!
Feel free to use them and happy training 😊
https://github.com/dralexlup/LLM-Abliteration
So we’ve made 2 notebook for abliterating any model in order to achieve a good model that will actually help you!
The notebooks are made using @mlabonne ‘s abliteration logic and datasets!
Feel free to use them and happy training 😊
https://github.com/dralexlup/LLM-Abliteration

replied to
their
post
27 days ago
Will do at some point, but I don't have time to write this down at the moment.

reacted to
burtenshaw's
post with 🚀❤️🤗
3 months ago
Post
3352
NEW UNIT in the Hugging Face Reasoning course. We dive deep into the algorithm behind DeepSeek R1 with an advanced and hands-on guide to interpreting GRPO.
🔗
reasoning-course
This unit is super useful if you’re tuning models with reinforcement learning. It will help with:
- interpreting loss and reward progression during training runs
- selecting effective parameters for training
- reviewing and defining effective reward functions
This unit also works up smoothly toward the existing practical exercises form @mlabonne and Unsloth.
📣 Shout out to @ShirinYamani who wrote the unit. Follow for more great content.
🔗

This unit is super useful if you’re tuning models with reinforcement learning. It will help with:
- interpreting loss and reward progression during training runs
- selecting effective parameters for training
- reviewing and defining effective reward functions
This unit also works up smoothly toward the existing practical exercises form @mlabonne and Unsloth.
📣 Shout out to @ShirinYamani who wrote the unit. Follow for more great content.

posted
an
update
4 months ago
Post
16238
✂️ AutoAbliteration
I made a Colab notebook to automatically abliterate models.
It's quite general, so you can do interesting stuff like blocking a given language in the model outputs.
💻 Colab: https://colab.research.google.com/drive/1RmLv-pCMBBsQGXQIM8yF-OdCNyoylUR1?usp=sharing
I made a Colab notebook to automatically abliterate models.
It's quite general, so you can do interesting stuff like blocking a given language in the model outputs.
💻 Colab: https://colab.research.google.com/drive/1RmLv-pCMBBsQGXQIM8yF-OdCNyoylUR1?usp=sharing

posted
an
update
4 months ago
Post
6347
✂️ Gemma 3 Abliterated
I noticed that Gemma 3 was much more resilient to refusal removal than other models like Qwen 2.5.
I experimented with different recipes and improved the abliteration technique I wrote about last year.
It's still experimental but the refusal rate is super low in my tests. Enjoy!
mlabonne/gemma-3-4b-it-abliterated
mlabonne/gemma-3-12b-it-abliterated
mlabonne/gemma-3-27b-it-abliterated
I noticed that Gemma 3 was much more resilient to refusal removal than other models like Qwen 2.5.
I experimented with different recipes and improved the abliteration technique I wrote about last year.
It's still experimental but the refusal rate is super low in my tests. Enjoy!
mlabonne/gemma-3-4b-it-abliterated
mlabonne/gemma-3-12b-it-abliterated
mlabonne/gemma-3-27b-it-abliterated

reacted to
burtenshaw's
post with 🤗❤️
4 months ago
Post
3938
I’m super excited to work with
@mlabonne
to build the first practical example in the reasoning course.
🔗
reasoning-course
Here's a quick walk through of the first drop of material that works toward the use case:
- a fundamental introduction to reinforcement learning. Answering questions like, ‘what is a reward?’ and ‘how do we create an environment for a language model?’
- Then it focuses on Deepseek R1 by walking through the paper and highlighting key aspects. This is an old school way to learn ML topics, but it always works.
- Next, it takes to you Transformers Reinforcement Learning and demonstrates potential reward functions you could use. This is cool because it uses Marimo notebooks to visualise the reward.
- Finally, Maxime walks us through a real training notebook that uses GRPO to reduce generation length. I’m really into this because it works and Maxime took the time to validate it share assets and logging from his own runs for you to compare with.
Maxime’s work and notebooks have been a major part of the open source community over the last few years. I, like everyone, have learnt so much from them.
🔗

Here's a quick walk through of the first drop of material that works toward the use case:
- a fundamental introduction to reinforcement learning. Answering questions like, ‘what is a reward?’ and ‘how do we create an environment for a language model?’
- Then it focuses on Deepseek R1 by walking through the paper and highlighting key aspects. This is an old school way to learn ML topics, but it always works.
- Next, it takes to you Transformers Reinforcement Learning and demonstrates potential reward functions you could use. This is cool because it uses Marimo notebooks to visualise the reward.
- Finally, Maxime walks us through a real training notebook that uses GRPO to reduce generation length. I’m really into this because it works and Maxime took the time to validate it share assets and logging from his own runs for you to compare with.
Maxime’s work and notebooks have been a major part of the open source community over the last few years. I, like everyone, have learnt so much from them.

reacted to
sometimesanotion's
post with 🚀
5 months ago
Post
3362
**Update** Either I had some wrong numbers plugged in to estimate benchmark numbers from comparator, or the benchmark changed. Virtuoso Small v2 at 41.07 average is still very impressive, especially for writing draft copy for business purposes, while Lamarck remains a chatty generalist-reasoning model.
I've felt confident that 14B Qwen finetunes and merges could break the 42.0 average, and Arcee **came close** with https://huggingface.co/arcee-ai/Virtuoso-Small-2. Congratulations to @arcee-ai !
Just two months ago, it was easy to think that 14B had plateaued, that you could have high IFEVAL or high MUSR/MATH/GPQA at 14B, but not both. That barrier is completely shattered. I see a pathway to even better, and Virtuoso Small 2 is a big part of why. Very impressive work. This community would expect no less from Arcee.
Just look at this graph! Keep in mind, my merges here build on the first Virtuoso Small, and *-DS merges build on DeepSeek R1. There are some impressive merges in the pipe!
I've felt confident that 14B Qwen finetunes and merges could break the 42.0 average, and Arcee **came close** with https://huggingface.co/arcee-ai/Virtuoso-Small-2. Congratulations to @arcee-ai !
Just two months ago, it was easy to think that 14B had plateaued, that you could have high IFEVAL or high MUSR/MATH/GPQA at 14B, but not both. That barrier is completely shattered. I see a pathway to even better, and Virtuoso Small 2 is a big part of why. Very impressive work. This community would expect no less from Arcee.
Just look at this graph! Keep in mind, my merges here build on the first Virtuoso Small, and *-DS merges build on DeepSeek R1. There are some impressive merges in the pipe!
hahaha

reacted to
m-ric's
post with ❤️🔥👀
5 months ago
Post
3456
Today we make the biggest release in smolagents so far: 𝘄𝗲 𝗲𝗻𝗮𝗯𝗹𝗲 𝘃𝗶𝘀𝗶𝗼𝗻 𝗺𝗼𝗱𝗲𝗹𝘀, 𝘄𝗵𝗶𝗰𝗵 𝗮𝗹𝗹𝗼𝘄𝘀 𝘁𝗼 𝗯𝘂𝗶𝗹𝗱 𝗽𝗼𝘄𝗲𝗿𝗳𝘂𝗹 𝘄𝗲𝗯 𝗯𝗿𝗼𝘄𝘀𝗶𝗻𝗴 𝗮𝗴𝗲𝗻𝘁𝘀! 🥳
Our agents can now casually open up a web browser, and navigate on it by scrolling, clicking elements on the webpage, going back, just like a user would.
The demo below shows Claude-3.5-Sonnet browsing GitHub for task: "Find how many commits the author of the current top trending repo did over last year."
Hi @mlabonne !
Go try it out, it's the most cracked agentic stuff I've seen in a while 🤯 (well, along with OpenAI's Operator who beat us by one day)
For more detail, read our announcement blog 👉 https://huggingface.co/blog/smolagents-can-see
The code for the web browser example is here 👉 https://github.com/huggingface/smolagents/blob/main/examples/vlm_web_browser.py
Our agents can now casually open up a web browser, and navigate on it by scrolling, clicking elements on the webpage, going back, just like a user would.
The demo below shows Claude-3.5-Sonnet browsing GitHub for task: "Find how many commits the author of the current top trending repo did over last year."
Hi @mlabonne !
Go try it out, it's the most cracked agentic stuff I've seen in a while 🤯 (well, along with OpenAI's Operator who beat us by one day)
For more detail, read our announcement blog 👉 https://huggingface.co/blog/smolagents-can-see
The code for the web browser example is here 👉 https://github.com/huggingface/smolagents/blob/main/examples/vlm_web_browser.py

posted
an
update
6 months ago
Post
6648
🆕 LLM Course 2025 edition!
I updated the LLM Scientist roadmap and added a ton of new information and references. It covers training, datasets, evaluation, quantization, and new trends like test-time compute scaling.
The LLM Course has been incredibly popular (41.3k stars!) and I've been touched to receive many, many messages about how it helped people in their careers.
I know how difficult this stuff can be, so I'm super proud of the impact it had. I want to keep updating it in 2025, especially with the LLM Engineer roadmap.
Thanks everyone, hope you'll enjoy it!
💻 LLM Course: https://huggingface.co/blog/mlabonne/llm-course
I updated the LLM Scientist roadmap and added a ton of new information and references. It covers training, datasets, evaluation, quantization, and new trends like test-time compute scaling.
The LLM Course has been incredibly popular (41.3k stars!) and I've been touched to receive many, many messages about how it helped people in their careers.
I know how difficult this stuff can be, so I'm super proud of the impact it had. I want to keep updating it in 2025, especially with the LLM Engineer roadmap.
Thanks everyone, hope you'll enjoy it!
💻 LLM Course: https://huggingface.co/blog/mlabonne/llm-course
that looks great, well done!

reacted to
CultriX's
post with ❤️
6 months ago
Post
2141
# Space for Multi-Agent Workflows using AutoGen
Hi all, I created this "AutoGen Multi-Agent Workflow" space that allows you to experiment with multi-agent workflows.
By default, it allows code generation with built-in quality control and automatic documentation generation. It achieves this by leveraging multiple AI agents working together to produce high-quality code snippets, ensuring they meet the specified requirements.
In addition to the default, the space allows users to set custom system messages for each assistant, potentially completely changing the workflow.
# Workflow Steps
1. User Input:
- The user defines a prompt, such as "Write a random password generator using python."
- Outcome: A clear task for the primary assistant to accomplish.
2. Primary Assistant Work:
- The primary assistant begins working on the provided prompt.
It generates an initial code snippet based on the user's request.
- Outcome: An initial proposal for the requested code.
3. Critic Feedback:
- The critic reviews the generated code provides feedback or (if the output meets the criteria), broadcasts the APPROVED message.
(This process repeats until the output is APPROVED or 10 messages have been exchanged).
- Outcome: A revised Python function that incorporates the critic's feedback.
4. Documentation Generation:
- Once the code is approved, it is passed to a documentation assistant.
The documentation assistant generates a concise documentation for the final code.
- Outcome: A short documentation including function description, parameters, and return values.
Enjoy!
CultriX/AutoGen-MultiAgent-Example
Hi all, I created this "AutoGen Multi-Agent Workflow" space that allows you to experiment with multi-agent workflows.
By default, it allows code generation with built-in quality control and automatic documentation generation. It achieves this by leveraging multiple AI agents working together to produce high-quality code snippets, ensuring they meet the specified requirements.
In addition to the default, the space allows users to set custom system messages for each assistant, potentially completely changing the workflow.
# Workflow Steps
1. User Input:
- The user defines a prompt, such as "Write a random password generator using python."
- Outcome: A clear task for the primary assistant to accomplish.
2. Primary Assistant Work:
- The primary assistant begins working on the provided prompt.
It generates an initial code snippet based on the user's request.
- Outcome: An initial proposal for the requested code.
3. Critic Feedback:
- The critic reviews the generated code provides feedback or (if the output meets the criteria), broadcasts the APPROVED message.
(This process repeats until the output is APPROVED or 10 messages have been exchanged).
- Outcome: A revised Python function that incorporates the critic's feedback.
4. Documentation Generation:
- Once the code is approved, it is passed to a documentation assistant.
The documentation assistant generates a concise documentation for the final code.
- Outcome: A short documentation including function description, parameters, and return values.
Enjoy!
CultriX/AutoGen-MultiAgent-Example

reacted to
burtenshaw's
post with ❤️
7 months ago
Post
2831
For anyone looking to boost their LLM fine-tuning and alignment skills this decemeber. We're running this free and open course called smol course. It’s not big like Li Yin and
@mlabonne
, it’s just smol.
👷 It focuses on practical use cases, so if you’re working on something, bring it along.
👯♀️ It’s peer reviewed and open so you can discuss and get feedback.
🤘 If you’re already a smol pro, feel free to drop a star or issue.
> > Part 1 starts now, and it’s on instruction tuning!
https://github.com/huggingface/smol-course
👷 It focuses on practical use cases, so if you’re working on something, bring it along.
👯♀️ It’s peer reviewed and open so you can discuss and get feedback.
🤘 If you’re already a smol pro, feel free to drop a star or issue.
> > Part 1 starts now, and it’s on instruction tuning!
https://github.com/huggingface/smol-course