Lambent commited on
Commit
0132ff0
1 Parent(s): e1c87fb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -1
README.md CHANGED
@@ -5,4 +5,8 @@ datasets:
5
  ---
6
  An untrained precursor MoE created from Cosmo using mergekit.
7
 
8
- Gate routing initialized using prompt hidden state method. Five are based on the visualized topic clusters of Cosmopedia data, three are task-oriented.
 
 
 
 
 
5
  ---
6
  An untrained precursor MoE created from Cosmo using mergekit.
7
 
8
+ Gate routing initialized using prompt hidden state method. Five are based on the visualized topic clusters of Cosmopedia data, three are task-oriented.
9
+
10
+ Degenerate layers are 0, 1, and 2 (I believe this means experts will be underutilized for the lowest-level features).
11
+ Best I could do with test-and-try prompt-based routing.
12
+ Further research might start from the reversed direction, if available in some interpretability tool (activating layer into prompts).