Save quantised model throws error

#68
by Manmax31 - opened

I am trying to quantise the bert-base-uncased model using the following code:
import torch
from transformers import BertTokenizer, BertModel
import os
from safetensors.torch import save_file

bert_base_model_path = "google-bert/bert-base-uncased"
bert_tokenizer = BertTokenizer.from_pretrained(bert_base_model_path)
bert_model = BertModel.from_pretrained(bert_base_model_path, output_attentions=True)

bert_device = torch.device("cpu")
bert_model.to(bert_device)

quantized_bert_model = torch.quantization.quantize_dynamic(
bert_model, {torch.nn.Linear}, dtype=torch.qint8
)

output_dir = "bert_base_uncased_quantised/"
SAFE_WEIGHTS_NAME = "model.safetensors"
state_dict = quantized_bert_model.state_dict()

save_file(state_dict, os.path.join(output_dir, SAFE_WEIGHTS_NAME), metadata={"format": "pt"})

When I try to save the same, I get the error:
ValueError: Key encoder.layer.0.attention.self.query._packed_params.dtype is invalid, expected torch.Tensor but received <class 'torch.dtype'>

How can I save this quantised model, so I can reload it elsewhere using BertModel.from_pretrained?

Sign up or log in to comment