Mixture of Experts, Branch Merge Train, International Cooperation, Reuse, https://github.com/ontocord/MDEL
This is our OLD landing page. Please visit our new organization page at [Multi-Domain-Expert-Learning] (https://huggingface.co/Multi-Domain-Expert-Learning)
🍩 Ontocord.AI 🍩 and the open source community.
Volunteers from: Bedrock AI, TurkuNLP, ETH, Redmond.AI, Incite, MICS CentraleSupelec, Centro de Excelência em Inteligência Artificial, VietAI, Technion - Israel Institute of Technology, Nous Research, University of Western Australia, KoboldAI Community, LAION.AI, Mila, Luleå University of Technology, Juelich Supercomputing Center, Tokyo Tech, RIKEN, Together
Open sourcing AI models can lead to increased innovation, accessibility, transparency, and community building. However we need a mechanism to train more capable models in an efficient and modular way.
The proposed method that we call Multi-Domain Expert Learning (MDEL) for open source language models involves branching from a base model, training each branch independently on a specific domain for specific layers or other adapters, and merging the trained models at the end. Additionally, the specific layers or adapters are kept as experts, with a classifier used as a router to activate the experts during inference. This approach makes it possible to easily increase expertise of a model, to independently train more "adapters", and to reuse previously trained experts and models without retraining, resulting in a modular and efficient system.
In this effort, we seek international labs and open source aligned researchers and companies in various countries to each train a set of domain experts of their choosing, thereby enabling international participation and knowledge sharing. This will also result in lower costs for training and a lower environmental impact due to reuse and lower energy usage. Currently we have volunteers from four continents and are looking for more.
We will be using a varient of the c-BTM (https://arxiv.org/pdf/2303.14177v1.pdf) method and will be focusing on models ranging from 7-70B parameters.
In some of our models, we will also be adding multi-lingual, multi-modal abilities for both understanding and generation with context lengths of 8K-35K tokens.
Languages will include: hi, vi, en, ja, fi. We may add others if compute is available.
If you are interested in contributing to this project, please reach out to us and learn more about how you can get involved at firstname.lastname@example.org.
Let's work together to create open-source models that benefit everyone! 🤝 #AI #MDEL #Supercomputers #Summit #OpenSource #Innovation #VolunteersNeeded #OpenScience #DemocratizeAI
** Why did we change the term "Layer" to "Learning"? Because we are exploring, in addition to layerwise experts, also working with different adapters and architecture like Flamingo (https://arxiv.org/abs/2204.14198), EMU (https://arxiv.org/abs/2307.05222) and a novel multi-node architecture for training loras we call lora-x, which will allow us to swap out different component experts to improve the performance of the model.