Edit model card

Frankenstein-MoE

Method

To initialize the gate projection weight of the MoE layer, the H6 trainset was sampled and used. We sampled 400 and selected the final 30 with low PPL.

trufulqa used gpt4 to generate data.

Evals

in progress

Downloads last month
1,361
Safetensors
Model size
36.1B params
Tensor type
FP16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.