Crystalcareai
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -12,10 +12,7 @@ license_link: https://ai.google.dev/gemma/terms
|
|
12 |
|
13 |
|
14 |
|
15 |
-
Note: If you wish to use GemMoE while it is in beta, you must
|
16 |
-
|
17 |
-
```bash
|
18 |
-
pip install git+https://github.com/Crystalcareai/transformers.git@GemMoE#egg=transformers
|
19 |
```
|
20 |
|
21 |
|
@@ -40,7 +37,7 @@ Adapting Mergekit to support Gemma was no small feat, and I had to make signific
|
|
40 |
|
41 |
I want to extend my gratitude to Maxime Labonne for his incredibly useful LLM course and colab notebooks, which helped me level up my fine-tuning skills. Jon Durbin's bagel GitHub repository was a crash course in what makes good data, and it played a crucial role in informing my data selection process. The transparency and example set by Teknium inspired me to turn my AI side hustle into a full-time gig, and for that, I am deeply grateful.
|
42 |
|
43 |
-
Locutusque's datasets
|
44 |
|
45 |
The Deepseek team's DeepseekMoE paper was a game-changer for me, providing critical insights into what makes an MoE as good as possible. I am also incredibly grateful to the entire Perplexity team, whose answer engine accelerated my education and understanding of AI by a factor of five (source: vibes).
|
46 |
|
|
|
12 |
|
13 |
|
14 |
|
15 |
+
Note: If you wish to use GemMoE while it is in beta, you must flag trust_remote_code=True in your training config.
|
|
|
|
|
|
|
16 |
```
|
17 |
|
18 |
|
|
|
37 |
|
38 |
I want to extend my gratitude to Maxime Labonne for his incredibly useful LLM course and colab notebooks, which helped me level up my fine-tuning skills. Jon Durbin's bagel GitHub repository was a crash course in what makes good data, and it played a crucial role in informing my data selection process. The transparency and example set by Teknium inspired me to turn my AI side hustle into a full-time gig, and for that, I am deeply grateful.
|
39 |
|
40 |
+
Locutusque's datasets were a prime example of aggregating data like a pro. Justin Lin from Alibaba research supported my previous project, Qwen1.5 - 8x7b, which laid the foundation for GemMoE. I also want to thank Deepmind for releasing Gemma and acknowledge the hard work and dedication of everyone who contributed to its development.
|
41 |
|
42 |
The Deepseek team's DeepseekMoE paper was a game-changer for me, providing critical insights into what makes an MoE as good as possible. I am also incredibly grateful to the entire Perplexity team, whose answer engine accelerated my education and understanding of AI by a factor of five (source: vibes).
|
43 |
|