Crystalcareai commited on
Commit
89644b8
1 Parent(s): dbffbea

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +54 -0
README.md ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # GemMoE: Sharing Tools and Improved Base Models
2
+
3
+ I'm excited to share the tools I used to create GemMoE and release improved base models for the community to explore and build upon.
4
+
5
+ ## Updates to GemMoE-Beta-1
6
+
7
+ GemMoE-Beta-1 will continue to serve as the repository for the `modeling_files` required to operate the Mixture of Experts (MoEs). However, I will be removing the PyTorch files from this repository.
8
+
9
+ ## Sponsors
10
+ I am currently looking for compute sponsors to continue the post-training of GemMoE, to bring it to its full potential. I've reached the limit to what I'm able to personally spend on this project. If you're interesting in helping out, please reach out.
11
+
12
+ ## New Models
13
+
14
+ I'm introducing two new models:
15
+
16
+ 1. [Crystalcareai/GemMoE-Base-Hidden](https://huggingface.co/Crystalcareai/GemMoE-Base-Hidden)
17
+ - This is a new MoE created using an improved method that I will explain below.
18
+ - It utilizes a hidden gate and shows strong potential.
19
+ - The model has not been altered and requires finetuning to reach its full potential.
20
+ - If you're looking to achieve great performance with relatively minimal training, this is an excellent starting point.
21
+
22
+ 2. [Crystalcareai/GemMoE-Base-Random](https://huggingface.co/Crystalcareai/GemMoE-Base-Random)
23
+ - This model was created using the same merge method as GemMoE-Base-Hidden, but with a RANDOM gate.
24
+ - It randomly selects the experts during the merging process.
25
+ - With finetuning, the model learns to choose the appropriate experts naturally, potentially leading to better results compared to GemMoE-Base-Hidden.
26
+ - This method offers an intriguing mix between clown-car and mixtral-style approaches.
27
+
28
+ The new merge method and modeling files also reduce VRAM usage, making the models easier to finetune.
29
+
30
+ ## Training Experiences and Challenges
31
+
32
+ I have successfully trained the models on a single A100 using Qlora, although it required careful monitoring and posed some difficulties. It appears there is currently an issue with Qlora and GemMoE. I observed better VRAM usage when using 4 A6000 cards and finetuning with Dora without any quantization and deepspeed_Zero3.
33
+
34
+ ## Creating Your Own Merges
35
+
36
+ You can create your own merges using my modified branch of mergekit:
37
+
38
+ ```bash
39
+ git clone -b gemmoe https://github.com/Crystalcareai/mergekit.git
40
+ ```
41
+
42
+ To create an exact replica of Crystalcareai/GemMoE-Base-Hidden, use the following command:
43
+
44
+ ```bash
45
+ mergekit-moe examples/gemmoe.yml ./merged --cuda --lazy-unpickle --allow-crimes
46
+ ```
47
+
48
+ Feel free to modify the `/examples/gemmoe.yml` file to customize the merge according to your preferences.
49
+
50
+ Alternatively, you can use my modified lazymergekit available on Colab: [lazymergekit-Gemmoe](https://colab.research.google.com/drive/1WWxCE4NYvJNZkjFhkL79cf-dRc3xTpGn?usp=drive_link)
51
+
52
+ Happy experimenting and building!
53
+
54
+ -Lucas