cosmoem-8x1B / README.md
Lambent's picture
Update README.md
a01b372 verified
metadata
license: apache-2.0
datasets:
  - HuggingFaceTB/cosmopedia

An untrained precursor MoE created from Cosmo using mergekit.

Gate routing initialized using prompt hidden state method. Five are based on the visualized topic clusters of Cosmopedia data, three are task-oriented.

Degenerate layers were 0, 1, and 2. Expert gates for layers 0, 1, and 2 have been randomly initialized to with luck mitigate this.