One or two models during inference?

#3
by Venkman42 - opened

Hi there,
Does this model select one of the models during inference or does it use both?
So is the inference speed comparable to a 7b model or 13b?

Just curious since the 8x7b mixtral models use two models during inference as far as I know

Owner

I think this is decided by setting of num_experts_per_tok.

I think this is decided by setting of num_experts_per_tok.

How do you set "num_experts_per_tok" config?

@SamuelAzran config.json

Sign up or log in to comment