One or two models during inference?

by Venkman42 - opened Jan 2

Jan 2

Hi there,
Does this model select one of the models during inference or does it use both?
So is the inference speed comparable to a 7b model or 13b?

Just curious since the 8x7b mixtral models use two models during inference as far as I know

Owner Jan 3

I think this is decided by setting of num_experts_per_tok.

Jan 3

I think this is decided by setting of num_experts_per_tok.

How do you set "num_experts_per_tok" config?

Jan 20

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment