Lambent commited on
Commit
8affad3
1 Parent(s): 233c68c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -1
README.md CHANGED
@@ -16,10 +16,20 @@ This is a merge of pre-trained language models created using [mergekit](https://
16
 
17
  Testing training data validation:
18
 
19
- * Model Stock Loss: 0.451
20
 
21
  My hypothesis that the pretraining was dragging down the stock merge's performance on training data in any way seems inaccurate.
22
 
 
 
 
 
 
 
 
 
 
 
23
  ## Merge Details
24
  ### Merge Method
25
 
 
16
 
17
  Testing training data validation:
18
 
19
+ * Model Stock 3/4 Loss: 0.451
20
 
21
  My hypothesis that the pretraining was dragging down the stock merge's performance on training data in any way seems inaccurate.
22
 
23
+ Cosmopedia data validation:
24
+
25
+ * Model Stock 3/4 Loss: 1.021
26
+
27
+ On the other hand, it indeed may have pulled it towards forgetfulness.
28
+ This is a better loss vs catastrophic forgetting than the prior Model Stock or any of the training methods.
29
+
30
+ I'm going to estimate that using the base model as an anchor point is a strong remedy for catastrophic forgetting when using multiple different training methods on the same dataset.
31
+ Less sure I can say anything about how it affects adaptation to the new dataset. It's possible that if using this method, you'd want louder/stronger adaptation to start with than you otherwise would.
32
+
33
  ## Merge Details
34
  ### Merge Method
35