Sparsity in mixtral

#137

by dpk17 - opened Feb 20

Feb 20

What are the sparse weights in mixtral? I looked at the intermediate layer which has matrices of size [14336, 4096] and counted number of non-zeroes using torch.count_nonzero(x). I did this by counting nonzeroes in the weights in the forward layer of the intermediate layer. All the entries in the matrix were non-zero. I am wondering what exact weights in the model are actually sparse.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment