fblgit/cybertron-v4-qw7B-MGS · This is amazing! Did you use my merge method?

Oct 30

This is really amazing that you were able to top my model. I see that you used mergekit to make this

https://huggingface.co/fblgit/cybertron-v4-qw7B-MGS/blob/main/model.safetensors.index.json

Did you use my merge method to create this (Continuous Finetuning) after training your model.

Id imagine your merge would have looked something like this

models:
  - model: Qwen_Qwen2.5-7B-Instruct
    parameters:
      weight: 1
      density: 1
  - model: Qwen_Qwen2.5-7B-Magpie-Qwen2.5-Pro-1M-v0.1-Tuned
    parameters:
      weight: 1
      density: 1
merge_method: ties
base_model: Qwen_Qwen2.5-7B
parameters:
  weight: 1
  density: 1
  normalize: true
  int8_mask: true
dtype: bfloat16

fblgit

Owner Oct 30

•

edited Oct 30

nah brother.. if i were to use that.. u wold see it in the README.. on the citations part.
Unfortunately, I haven't been able to prove your continous theory.. just like your results.. the result of doing that is a turd.

rombodawg

Oct 30

@fblgit Im not sure what you mean by " just like your results.. the result of doing that is a turd" as my results have only improved the models performance.

Its clear you used mergekit to create this model by your "model.safetensors.index.json", can you at least share what you did using mergekit after tuning?

fblgit

Owner Oct 30

mm.. fair question. and since u were open on it.. i'll be doing similarly:
https://arxiv.org/pdf/2410.21228

U can see SFT vs LoRA differences.
Corpora forgetful impact.
Think how u can tackle that with mergekit using ur own SFT loras.

fblgit changed discussion status to closed Oct 30