Alfitaria
/

MN-12B-solracht-EXPERIMENTAL-011425

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

inflatebot commited on 24 days ago

Commit

a3b6a81

·

verified ·

1 Parent(s): 0b7470c

Update README.md

Files changed (1) hide show

README.md +2 -3

README.md CHANGED Viewed

@@ -17,9 +17,8 @@ This is a merge of pre-trained language models created using [mergekit](https://
 This is an experimental release of MN-12B-Mag-Mell, to test the NuSLERP feature in Mergekit. **The expectation is that this model behaves exactly like Mag Mell R1.**
-It has been observed in testing that it doesn't produce literally the same outputs, despite being in theory a replication of legacy SLERP behavior with NuSLERP hyperparameters, but I'm posting this *so that people can tell me whether or not this is the case.*
-To reiterate: **The expectation is that this has ==the exact same problems== that Mag Mell does.**
 ### Merge Method

 This is an experimental release of MN-12B-Mag-Mell, to test the NuSLERP feature in Mergekit. **The expectation is that this model behaves exactly like Mag Mell R1.**
+It has been observed in testing that it doesn't produce literally the same outputs, despite being in theory a replication of legacy SLERP behavior with NuSLERP hyperparameters. After pondering while this was uploading, it appears likely that the reason for the difference is that DARE pruned different sets of parameters each time.
+To reiterate: **The expectation is that this has ==the exact same problems== that Mag Mell does.** I'm posting this *so that people can tell me whether or not this is the case.*
 ### Merge Method