Maxime Labonne PRO

mlabonne

AI & ML interests

Post-training, model editing, quantization

Articles

Organizations

mlabonne's activity

replied to their post 8 days ago
view reply

That's an interesting take. I'm surprised it's also worse in terms of structure (hero's journey here).

replied to their post 8 days ago
view reply

It could be a good explanation, unfortunately we know so little about how these large models were trained... :(

posted an update 9 days ago
view post
Post
6270
Large models are surprisingly bad storytellers.

I asked 8 LLMs to "Tell me a bedtime story about bears and waffles."

Claude 3.5 Sonnet and GPT-4o gave me the worst stories: no conflict, no moral, zero creativity.

In contrast, smaller models were quite creative and wrote stories involving talking waffle trees and bears ostracized for their love of waffles.

Here you can see a comparison between Claude 3.5 Sonnet and NeuralDaredevil-8B-abliterated. They both start with a family of bears but quickly diverge in terms of personality, conflict, etc.

I mapped it to the hero's journey to have some kind of framework. Prompt engineering can definitely help here, but it's still disappointing that the larger models don't create better stories right off the bat.

Do you know why smaller models outperform the frontier models here?
ยท
replied to their post about 1 month ago
view reply

I don't know enough about diffusion models to have a definitive answer, but something similar should be doable

replied to their post about 2 months ago
view reply

Depending on the model, it can be as simple as killing a process in Python

posted an update about 2 months ago
view post
Post
6542
โœ‚๏ธ Uncensor any LLM with abliteration

I wrote an article about abliteration and how NeuralDaredevil-8B was created. Beyond removing alignment, I believe it's an interesting technique with a lot of potential. It's basically fine-tuning without retraining.

In this article, we see how it works, implement it in Google Colab, and heal the abliterated model to recover the performance drop due to this technique. The final model is an uncensored and high-quality model with the highest MMLU score on the Open LLM Leaderboard (8B category).

https://huggingface.co/blog/mlabonne/abliteration
ยท
replied to andrewrreed's post 3 months ago
view reply

Haha awesome I thought about doing it but I'm glad you made it, it looks great! :)

(Just FIY, the font is white on white when you're not using a dark theme.)

replied to their post 3 months ago
view reply

Ah interesting, I didn't think this would have an impact since it's just the name of the file. I'll keep it in mind if I or someone run into an issue again. Thanks @CultriX !

replied to their post 3 months ago
replied to their post 3 months ago
view reply

Mostly RunPod. I think nothing bigger than 4xA100s on it.

posted an update 3 months ago
view post
Post
12654
๐Ÿ” AutoMerger created the best 7B model on the Open LLM Leaderboard

By randomly combining top models from the Open LLM Leaderboard, AutoMerger created YamshadowExperiment28-7B. The model is three weeks old and has been at the top of the leaderboard for a week now. It was created through a simple SLERP merge of:

- automerger/YamShadow-7B (another top model created by AutoMerger)
- yam-peleg/Experiment28-7B (a top model from @yam-peleg )

1/ On the Open LLM Leaderboard, it managed to outperform the excellent M7-7b model, which has been the #1 7B model for a while now.

2/ On the YALL leaderboard, YamshadowExperiment28-7B is ranked as the 9th best-performing automerge (but note that the scores are very close to each other). Compared to others, it does not perform particularly well on AGIEval or Bigbench.

3/ Thanks to @sam-paech , I have scores on EQ-Bench, where it managed to outperform all of my previous models. It even surpasses recent models such as DBRX instruct, Qwen1.5 32B Chat, and Cohere's Command R+.

Surprisingly, it does not support ChatML or Mistral Instruct, unlike my other merges (which are part of its family tree). Alpaca works well 99% of the time, but the model can sometimes produce a lot of "INST" tokens for no reason.

In my experiments, YamshadowExperiment28-7B doesn't seem smarter than other successful merges like AlphaMonarch. On the contrary, I found several mathematical or reasoning problems where it fails.

Considering these results, it looks like it might overfit the Open LLM Leaderboard. I guess it's anything but surprising when you randomly merge 156 models.

๐Ÿค— Model: automerger/YamshadowExperiment28-7B
๐Ÿ” AutoMerger: mlabonne/AutoMerger
replied to their post 4 months ago
view reply

Thanks @umarigan !

I had some error with numpy which says to upgrade so i just added pip install -u numpy to this code only.

I tried it and it doesn't fix this issue, unfortunately. You should be able to ignore it, it doesn't impact the quantization.

I was able to quantize your model using AutoQuant, so not sure where your error comes from: https://huggingface.co/mlabonne/Hermes-7B-TR-GGUF

replied to their post 4 months ago
view reply

I simply use Colab if I want to quantize a 7B model. Otherwise, cloud GPUs when needed.

posted an update 4 months ago
view post
Post
9044
โšก AutoQuant

AutoQuant is the evolution of my previous AutoGGUF notebook (https://colab.research.google.com/drive/1P646NEg33BZy4BfLDNpTz0V0lwIU3CHu). It allows you to quantize your models in five different formats:

- GGUF: perfect for inference on CPUs (and LM Studio)
- GPTQ/EXL2: fast inference on GPUs
- AWQ: super fast inference on GPUs with vLLM (https://github.com/vllm-project/vllm)
- HQQ: extreme quantization with decent 2-bit and 3-bit models

Once the model is converted, it automatically uploads it on the Hugging Face Hub. To quantize a 7B model, GGUF only needs a T4 GPU, while the other methods require an A100 GPU.

Here's an example of a model I quantized using HQQ and AutoQuant: mlabonne/AlphaMonarch-7B-2bit-HQQ

I hope you'll enjoy it and quantize lots of models! :)

๐Ÿ’ป AutoQuant: https://colab.research.google.com/drive/1b6nqC7UZVt8bx4MksX7s656GXPM-eWw4
ยท
replied to their post 5 months ago
replied to BramVanroy's post 6 months ago
view reply

I've never seen that before actually. Maybe too many steps?

posted an update 6 months ago
view post
Post
๐ŸŒณ Model Family Tree

Merging models has become a powerful way to compress information and build powerful models for cheap. Right now, the process is still quite experimental: which models to merge? which parameters should I use? We have some intuition but no principled approach.

I made a little tool to make things a little clearer. It allows you to visualize the family tree of any model on the Hub. It also displays the type of license they use: permissive (green), noncommercial (red), and unknown (gray). It should help people select the right license based on the parent models.

In addition, I hope it can be refined to extract more information about these models: do models from very different branches work better when merged? Can we select them based on the weight difference? There are a lot of questions to explore in this new space. :)

Here's a link to the colab notebook I made: https://colab.research.google.com/drive/1s2eQlolcI1VGgDhqWIANfkfKvcKrMyNr
If you want to know more about model merging or build you own merges, here's the article I wrote about this topic: https://huggingface.co/blog/mlabonne/merge-models
ยท
replied to osanseviero's post 6 months ago
view reply

Really cool to see these results, thank you!