Wait? 4x13b model?

by mirek190 - opened Dec 15, 2023

Discussion

mirek190

Dec 15, 2023

WTF ;D

Yhyu13

Dec 15, 2023

•

edited Dec 15, 2023

Yeah, transformer suppots defining your own MoE fine-tuning with MoE.

A MoE is implemented by nn.linear softmax then choose the top # models for sampling. But I am still curious how it is done

Edit:
Here is the disscussion : https://huggingface.co/Undi95/Llamix2-MLewd-4x13B/discussions/1?not-for-all-audiences=true

smartdavik

Jun 22

This is probably a dumb question but i won't know the answer till i ask and research hasn't quite made it clear to me. How do i determine the max context size i can use with this? I see it limited to 2048 in the default sillytavern setup, but i've seen it mentioned that you can turn it up higher in some cases. If I'm asking in the wrong place i apologize and then ask, where's the right place?

YaTharThShaRma999

Jun 22

@smartdavik the max is 4096 for this model as its a l2 based model which is 4096 ctx.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment