Edit model card

An untrained precursor MoE created from Cosmo using mergekit.

Gate routing initialized using prompt hidden state method. Five are based on the visualized topic clusters of Cosmopedia data, three are task-oriented.

Degenerate layers were 0, 1, and 2. Expert gates for layers 0, 1, and 2 have been randomly initialized to with luck mitigate this.

Downloads last month
4
Safetensors
Model size
10.2B params
Tensor type
F32
·

Dataset used to train Lambent/cosmoem-8x1B