Lambent
/

cosmo-1b-stock-pythontest-0.1

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Lambent commited on Apr 16

Commit

8affad3

•

1 Parent(s): 233c68c

Update README.md

Files changed (1) hide show

README.md +11 -1

README.md CHANGED Viewed

@@ -16,10 +16,20 @@ This is a merge of pre-trained language models created using [mergekit](https://
 Testing training data validation:
-* Model Stock Loss: 0.451
 My hypothesis that the pretraining was dragging down the stock merge's performance on training data in any way seems inaccurate.
 ## Merge Details
 ### Merge Method

 Testing training data validation:
+* Model Stock 3/4 Loss: 0.451
 My hypothesis that the pretraining was dragging down the stock merge's performance on training data in any way seems inaccurate.
+Cosmopedia data validation:
+* Model Stock 3/4 Loss: 1.021
+On the other hand, it indeed may have pulled it towards forgetfulness.
+This is a better loss vs catastrophic forgetting than the prior Model Stock or any of the training methods.
+I'm going to estimate that using the base model as an anchor point is a strong remedy for catastrophic forgetting when using multiple different training methods on the same dataset.
+Less sure I can say anything about how it affects adaptation to the new dataset. It's possible that if using this method, you'd want louder/stronger adaptation to start with than you otherwise would.
 ## Merge Details
 ### Merge Method