Do you need fine-tune after merging?

#5
by tanganke - opened

Great model. I wonder know how did you get the weights for the MoE routers?

Owner

don't need fine-tune, because only two experts

Thanks for your time!

I am also trying to construct a MoE model like this using mergekit.
The configuration needs to specify a base model and positive prompts. How did you set these?

base_model: ???
gate_mode: hidden
dtype: float32

experts:
  - source_model: NurtureAI/neural-chat-7b-v3-16k # https://huggingface.co/NurtureAI/neural-chat-7b-v3-16k
    positive_prompts:
      - "???"
    #   (optional)
    # negative_prompts:
    #   - "This is a prompt expert_model_1 should not be used for"
  - source_model: mncai/mistral-7b-dpo-v6 # https://huggingface.co/mncai/mistral-7b-dpo-v6
    positive_prompts:
      - "???"

You have to try every candidate and then locally test the model performance by https://github.com/EleutherAI/lm-evaluation-harness.
I use hellaswag metric only and some manual testing.
You will find the best setting sooner or later.
Good luck!

tanganke changed discussion status to closed

Sign up or log in to comment