Crystalcareai
/

GemMoE-Beta-1

Text Generation

Model card Files Files and versions Community

Crystalcareai commited on Mar 13, 2024

Commit

0ec1735

·

verified ·

1 Parent(s): 706595d

Update README.md

Files changed (1) hide show

README.md +2 -5

README.md CHANGED Viewed

@@ -12,10 +12,7 @@ license_link: https://ai.google.dev/gemma/terms
-Note: If you wish to use GemMoE while it is in beta, you must install transformers from my branch using:
-```bash
-pip install git+https://github.com/Crystalcareai/transformers.git@GemMoE#egg=transformers
 ```
@@ -40,7 +37,7 @@ Adapting Mergekit to support Gemma was no small feat, and I had to make signific
 I want to extend my gratitude to Maxime Labonne for his incredibly useful LLM course and colab notebooks, which helped me level up my fine-tuning skills. Jon Durbin's bagel GitHub repository was a crash course in what makes good data, and it played a crucial role in informing my data selection process. The transparency and example set by Teknium inspired me to turn my AI side hustle into a full-time gig, and for that, I am deeply grateful.
-Locutusque's datasets served as a prime example of how to aggregate data like a pro. Justin Lin from Alibaba research supported my previous project, Qwen1.5 - 8x7b, which laid the foundation for GemMoE. I also want to thank Deepmind for releasing Gemma and acknowledge the hard work and dedication of everyone who contributed to its development.
 The Deepseek team's DeepseekMoE paper was a game-changer for me, providing critical insights into what makes an MoE as good as possible. I am also incredibly grateful to the entire Perplexity team, whose answer engine accelerated my education and understanding of AI by a factor of five (source: vibes).

+Note: If you wish to use GemMoE while it is in beta, you must flag trust_remote_code=True in your training config.
 ```
 I want to extend my gratitude to Maxime Labonne for his incredibly useful LLM course and colab notebooks, which helped me level up my fine-tuning skills. Jon Durbin's bagel GitHub repository was a crash course in what makes good data, and it played a crucial role in informing my data selection process. The transparency and example set by Teknium inspired me to turn my AI side hustle into a full-time gig, and for that, I am deeply grateful.
+Locutusque's datasets were a prime example of aggregating data like a pro. Justin Lin from Alibaba research supported my previous project, Qwen1.5 - 8x7b, which laid the foundation for GemMoE. I also want to thank Deepmind for releasing Gemma and acknowledge the hard work and dedication of everyone who contributed to its development.
 The Deepseek team's DeepseekMoE paper was a game-changer for me, providing critical insights into what makes an MoE as good as possible. I am also incredibly grateful to the entire Perplexity team, whose answer engine accelerated my education and understanding of AI by a factor of five (source: vibes).