LeroyDyer
/

LCARS_AI_1x4_002_SuperAI

Text Generation

Not-For-All-Audiences

Mixture of Experts

Mixture-Of-Experts

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

LeroyDyer commited on Apr 9

Commit

e178e12

•

1 Parent(s): 5124a19

Update README.md

Files changed (1) hide show

README.md +3 -1

README.md CHANGED Viewed

@@ -11,10 +11,12 @@ tags:
 - chemistry
 - moe
 - merge
 ---
 Mixture of Experts enable models to be pretrained with far less compute, which means you can dramatically scale up the model or dataset size with the same compute budget as a dense model. In particular, a MoE model should achieve the same quality as its dense counterpart much faster during pretraining. gate network or router, that determines which tokens are sent to which expert. For example, in the image below, the token “More” is sent to the second expert, and the token "Parameters” is sent to the first network. As we’ll explore later, we can send a token to more than one expert. How to route a token to an expert is one of the big decisions when working with MoEs - the router is composed of learned parameters and is pretrained at the same time as the rest of the network.
 Base Model
 mistralai/Mistral-7B-Instruct-v0.2
-The Mistral-7B-Instruct-v0.2 Large Language Model (LLM) is an improved instruct fine-tuned version of Mistral-7B-Instruct-v0.1.

 - chemistry
 - moe
 - merge
+- music
+- Cyber-Series
 ---
 Mixture of Experts enable models to be pretrained with far less compute, which means you can dramatically scale up the model or dataset size with the same compute budget as a dense model. In particular, a MoE model should achieve the same quality as its dense counterpart much faster during pretraining. gate network or router, that determines which tokens are sent to which expert. For example, in the image below, the token “More” is sent to the second expert, and the token "Parameters” is sent to the first network. As we’ll explore later, we can send a token to more than one expert. How to route a token to an expert is one of the big decisions when working with MoEs - the router is composed of learned parameters and is pretrained at the same time as the rest of the network.
 Base Model
 mistralai/Mistral-7B-Instruct-v0.2
+The Mistral-7B-Instruct-v0.2 Large Language Model (LLM) is an improved instruct fine-tuned version of Mistral-7B-Instruct-v0.1.