Update README.md
Browse files
README.md
CHANGED
@@ -59,4 +59,4 @@ All of these were folded into the base model, and they critically all agreed abo
|
|
59 |
I love the interest and diversity of model stock, but after trying many different merges, I realized that I had to try something else. Specifically, TIES. Model stock and TIES are almost polar opposites. TIES acts as an amplifier, when models agree, the task vectors align, TIES strengthens those underlying weights, this means things that are good about the model get amplified, things that are bad get amplified. Model stock smoothes things out, smoothing the weights out between models. If smoothing is an issue, lets try amplifying. I've avoided TIES merges, because I'm specifically trying to avoid some of the bad mannerisms of the base ablation nemo model. However, I tried it anyways. Wouldn't you know it, TIES preserved the EOS, and can actually shut up most of the time. Not only that, but the model result is good. Quite good. The instruct is simply better than prior Mirai's and I don't think it's by a small margin either. There's some quirks, but I'm still able to run inference without any penalties and with the same sampling settings I've been running with. This was really surprising to me, I had not anticipated good results with TIES merging, but I'll eat my shoes now, it's good. The model is by no means perfect, there's some edge areas that end up in strange outputs, and the model occassionaly will insert phrases that appeared commonly into the end of responses. However, overall, I like the result.
|
60 |
|
61 |
# Ties is Really, Really slow, And also Evolutions or Something
|
62 |
-
Ties takes something like 10-15x longer than model stock. It has to calculate a bunch of fancy vectors and directions, and this is slow. In practice, this means it's even slower for me to iterate on evolutions. Speaking of, which evolution is this? Well, this is where it gets weird, because the previous evolution was 13, but all of these were done as model stock merges. I just decide to switch to TIES out of the blue. This means that bascially none of the other evolutions
|
|
|
59 |
I love the interest and diversity of model stock, but after trying many different merges, I realized that I had to try something else. Specifically, TIES. Model stock and TIES are almost polar opposites. TIES acts as an amplifier, when models agree, the task vectors align, TIES strengthens those underlying weights, this means things that are good about the model get amplified, things that are bad get amplified. Model stock smoothes things out, smoothing the weights out between models. If smoothing is an issue, lets try amplifying. I've avoided TIES merges, because I'm specifically trying to avoid some of the bad mannerisms of the base ablation nemo model. However, I tried it anyways. Wouldn't you know it, TIES preserved the EOS, and can actually shut up most of the time. Not only that, but the model result is good. Quite good. The instruct is simply better than prior Mirai's and I don't think it's by a small margin either. There's some quirks, but I'm still able to run inference without any penalties and with the same sampling settings I've been running with. This was really surprising to me, I had not anticipated good results with TIES merging, but I'll eat my shoes now, it's good. The model is by no means perfect, there's some edge areas that end up in strange outputs, and the model occassionaly will insert phrases that appeared commonly into the end of responses. However, overall, I like the result.
|
60 |
|
61 |
# Ties is Really, Really slow, And also Evolutions or Something
|
62 |
+
Ties takes something like 10-15x longer than model stock. It has to calculate a bunch of fancy vectors and directions, and this is slow. In practice, this means it's even slower for me to iterate on evolutions. Speaking of, which evolution is this? Well, this is where it gets weird, because the previous evolution was 13, but all of these were done as model stock merges. I just decide to switch to TIES out of the blue. This means that bascially none of the other evolutions were relevant in the analysis of this one, since changing the merging strategy basically rebases the whole project. Therefore, this is sort of evolution 1, despite having many models incorporated already. I'll be calling it evolution 1, but just know this was the actual reality of the situation.
|