Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
mlabonne 
posted an update Jan 20
Post
🌳 Model Family Tree

Merging models has become a powerful way to compress information and build powerful models for cheap. Right now, the process is still quite experimental: which models to merge? which parameters should I use? We have some intuition but no principled approach.

I made a little tool to make things a little clearer. It allows you to visualize the family tree of any model on the Hub. It also displays the type of license they use: permissive (green), noncommercial (red), and unknown (gray). It should help people select the right license based on the parent models.

In addition, I hope it can be refined to extract more information about these models: do models from very different branches work better when merged? Can we select them based on the weight difference? There are a lot of questions to explore in this new space. :)

Here's a link to the colab notebook I made: https://colab.research.google.com/drive/1s2eQlolcI1VGgDhqWIANfkfKvcKrMyNr
If you want to know more about model merging or build you own merges, here's the article I wrote about this topic: https://huggingface.co/blog/mlabonne/merge-models

Thanks for sharing the Colab notebook, it's incredibly useful and easy to use!

This is super cool! I tried it today with https://huggingface.co/EmbeddedLLM/Mistral-7B-Merge-14-v0.1 and ended with super cool results. I see quite a bit of base_model metadata comes from mergekit configs. Once that's added to the model card metadata it will be quite easy to explore all these relationships!

GETVwPXXsAA3nFn.jpeg

This comment has been hidden

Here's a HF Space for easier usage:

https://huggingface.co/spaces/mrfakename/merge-model-tree

Fantastic work! Thank you for putting this together!

Awesome tool!

BTW, I was trying to get a tree on https://huggingface.co/mlabonne/AlphaMonarch-7B and it was getting caught in a recursion loop. I started first by adding caching on the ModelCard assuming it'd figure things out but it didn't and I hacked in some stuff preventing revisits (also added some weak handling for missing models since that was looping as well since AIDC-ai-business/Marcoroni-7B-v3 for example has disappeared).

Anyway, my updated code still has broken chart rendering (cyclic graph - what was causing the looping issues) but at least it will get a list of the model lineage, which was good enough for my purposes... In case anyone wants to move this forward or needs a reference in case they run into looping issues: https://colab.research.google.com/drive/1-7w_pPWPCCQQpQ7LrvlKIdhyHsoCHH4E?usp=sharing

·

Thanks a lot @leonardlin ! I fixed the issue you raised and updated the original notebook with your changes: https://colab.research.google.com/drive/1s2eQlolcI1VGgDhqWIANfkfKvcKrMyNr