Why is the size of pruned model bigger than the original ones after 24 layers been sliced?

#1
by iheardyoulooking - opened

Usually after structured pruning the model size should be smaller. but
the original one: 15GB
sliced one: 20GB+

@iheardyoulooking it's because the model has been uploaded in 32 bit float format where the original Mistral is bfloat16. That makes each param in the sliced version twice as big on disk

You can still load the model in 16 bit by passing a torch_dtypeargument

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained('arcee-ai/Mistral-7B-Instruct-v0.2-sliced-24-layer')
model = AutoModelForCausalLM.from_pretrained(
    'arcee-ai/Mistral-7B-Instruct-v0.2-sliced-24-layer',
    torch_dtype=torch.bfloat16
)

@Shamane should we re-upload in 16 bit? I can do it if you'd like

Arcee AI org

Yes exactly. I think I made a mistake, @thomasgauthier please go ahead and thanks a lot.

Shamane changed discussion status to closed
Shamane changed discussion status to open
No description provided.
Shamane changed discussion status to closed

Sign up or log in to comment