Sparsity in mixtral

#137
by dpk17 - opened

What are the sparse weights in mixtral? I looked at the intermediate layer which has matrices of size [14336, 4096] and counted number of non-zeroes using torch.count_nonzero(x). I did this by counting nonzeroes in the weights in the forward layer of the intermediate layer. All the entries in the matrix were non-zero. I am wondering what exact weights in the model are actually sparse.

Sign up or log in to comment