Post
1684
I propose "merge densification", a style of merger which attempts to transfer the benefits of a denser model to a base model. The model weight in this case is 0.02, which is atypically small for mergers, but high compared to the learning rate used during training. In this case, the expectation is more creative text-generation. More details below:
grimjim/kunoichi-lemon-royale-v3-32K-7B
grimjim/kunoichi-lemon-royale-v3-32K-7B