Maxime Labonne PRO
AI & ML interests
Articles
Organizations
mlabonne's activity
Haha awesome I thought about doing it but I'm glad you made it, it looks great! :)
(Just FIY, the font is white on white when you're not using a dark theme.)
Ah interesting, I didn't think this would have an impact since it's just the name of the file. I'll keep it in mind if I or someone run into an issue again. Thanks @CultriX !
Hmm not sure I understand, you replaced "f16" with "fp16" to convert the model? "f16" should work, it's documented and used in llama.cpp (e.g., https://github.com/ggerganov/llama.cpp/blob/132f55795e51094954f1b1f647f97648be724a3a/scripts/pod-llama.sh#L93). I also only ran this command with "f16" without any issue so far. Did you also run it on Colab?
Mostly RunPod. I think nothing bigger than 4xA100s on it.
By randomly combining top models from the Open LLM Leaderboard, AutoMerger created YamshadowExperiment28-7B. The model is three weeks old and has been at the top of the leaderboard for a week now. It was created through a simple SLERP merge of:
- automerger/YamShadow-7B (another top model created by AutoMerger)
- yam-peleg/Experiment28-7B (a top model from @yam-peleg )
1/ On the Open LLM Leaderboard, it managed to outperform the excellent M7-7b model, which has been the #1 7B model for a while now.
2/ On the YALL leaderboard, YamshadowExperiment28-7B is ranked as the 9th best-performing automerge (but note that the scores are very close to each other). Compared to others, it does not perform particularly well on AGIEval or Bigbench.
3/ Thanks to @sam-paech , I have scores on EQ-Bench, where it managed to outperform all of my previous models. It even surpasses recent models such as DBRX instruct, Qwen1.5 32B Chat, and Cohere's Command R+.
Surprisingly, it does not support ChatML or Mistral Instruct, unlike my other merges (which are part of its family tree). Alpaca works well 99% of the time, but the model can sometimes produce a lot of "INST" tokens for no reason.
In my experiments, YamshadowExperiment28-7B doesn't seem smarter than other successful merges like AlphaMonarch. On the contrary, I found several mathematical or reasoning problems where it fails.
Considering these results, it looks like it might overfit the Open LLM Leaderboard. I guess it's anything but surprising when you randomly merge 156 models.
๐ค Model: automerger/YamshadowExperiment28-7B
๐ AutoMerger: mlabonne/AutoMerger
Thanks @umarigan !
I had some error with numpy which says to upgrade so i just added pip install -u numpy to this code only.
I tried it and it doesn't fix this issue, unfortunately. You should be able to ignore it, it doesn't impact the quantization.
I was able to quantize your model using AutoQuant, so not sure where your error comes from: https://huggingface.co/mlabonne/Hermes-7B-TR-GGUF
I simply use Colab if I want to quantize a 7B model. Otherwise, cloud GPUs when needed.
AutoQuant is the evolution of my previous AutoGGUF notebook (https://colab.research.google.com/drive/1P646NEg33BZy4BfLDNpTz0V0lwIU3CHu). It allows you to quantize your models in five different formats:
- GGUF: perfect for inference on CPUs (and LM Studio)
- GPTQ/EXL2: fast inference on GPUs
- AWQ: super fast inference on GPUs with vLLM (https://github.com/vllm-project/vllm)
- HQQ: extreme quantization with decent 2-bit and 3-bit models
Once the model is converted, it automatically uploads it on the Hugging Face Hub. To quantize a 7B model, GGUF only needs a T4 GPU, while the other methods require an A100 GPU.
Here's an example of a model I quantized using HQQ and AutoQuant: mlabonne/AlphaMonarch-7B-2bit-HQQ
I hope you'll enjoy it and quantize lots of models! :)
๐ป AutoQuant: https://colab.research.google.com/drive/1b6nqC7UZVt8bx4MksX7s656GXPM-eWw4
Thanks a lot @leonardlin ! I fixed the issue you raised and updated the original notebook with your changes: https://colab.research.google.com/drive/1s2eQlolcI1VGgDhqWIANfkfKvcKrMyNr
I've never seen that before actually. Maybe too many steps?
Merging models has become a powerful way to compress information and build powerful models for cheap. Right now, the process is still quite experimental: which models to merge? which parameters should I use? We have some intuition but no principled approach.
I made a little tool to make things a little clearer. It allows you to visualize the family tree of any model on the Hub. It also displays the type of license they use: permissive (green), noncommercial (red), and unknown (gray). It should help people select the right license based on the parent models.
In addition, I hope it can be refined to extract more information about these models: do models from very different branches work better when merged? Can we select them based on the weight difference? There are a lot of questions to explore in this new space. :)
Here's a link to the colab notebook I made: https://colab.research.google.com/drive/1s2eQlolcI1VGgDhqWIANfkfKvcKrMyNr
If you want to know more about model merging or build you own merges, here's the article I wrote about this topic: https://huggingface.co/blog/mlabonne/merge-models
Really cool to see these results, thank you!