"This is a mega merge"

#1
by deleted - opened
deleted

Then it has to be labeled as a merger.

Owner
β€’
edited Apr 8

they are never just merges friend.
they are a product of fine tuning and merging as well as dpo/Sft training ... as there is a system to it .... as you know the models are themed to a specific theme hence being fine tuned on the affor-mentioned data , often over fit ! always below 0.0 .... hence being so focused on the specific theme :
Just to give you a hint in training ;
The prompts from the datasets may even need to be removed and new prompts entered , guiding the role .. as well as the task.
when it comes to tasks the output needs to be varied and rich as well as well formatted and thought out.
where as role play needs the looseness ie: causal speech patterns as well as prompts, sometime merges are from other whom are specifically tuned for lewdness or other unwanted types of speech... hence crafting the roles accordingly.
when merging the originals are actually lost also , reminants of thier data remain hence consuming them. allowing for the tensors to Become even more complex and dense , but in truth they are all just a merge calculation which seems to complement the data often combining. even unlocking data and removing refusals. of which we do not need:

when a model has been tuned it is actually now your own custom model ! in fact the model compared to the original base mistral is totally different:

But!:
Recently i understood that you can just start a new model of any model in transformers : based on any based code model : even combine architectures with other model which bridge the gap between models changing the neural network itself to a custom network.... : hence by adding fine tuning you have actually added a extra custom layer to the model ; despite it being "Merged into the model at the end of the process to apply the training" as you know these lora models are inter-changeable (on the same base-Code) not on hybrid networks :

So despite my sympathy to your understanding of neural networks and the customizing of them ; as based on the knowledge you should understand that unless you initiated your model from base they are all merged models !
hence remote code ! and strange models:

If your model is a 7b then you may even notice that the layers between models are even inter-changable!

so for me this model is way past a merge ! hence Mega Merge ! as also they are often updated inplace and not to new models so that i can find them again !
All of my models are stages in a single model's development !

thanks bro !

Sign up or log in to comment