cosmoem-8x1B / README.md
Davidbeloco's picture
Photo
481582c verified
|
raw
history blame
622 Bytes
metadata
license: apache-2.0
datasets:
  - HuggingFaceTB/cosmopedia
pipeline_tag: text-classification

An untrained precursor MoE created from Cosmo using mergekit.

Gate routing initialized using prompt hidden state method. Five are based on the visualized topic clusters of Cosmopedia data, three are task-oriented.

Degenerate layers are 0, 1, and 2 (I believe this means experts will be underutilized for the lowest-level features). Best I could do with test-and-try prompt-based routing. Further research might start from the reversed direction, if available in some interpretability tool (activating layer into prompts).