Breaks with quantization in newer transformers revisions (fix included)

#21
by panopstor - opened

Recent changes to Huggingface transformers package will break this implementation. The inv_freq is being stored in the state_dict but not being in _parameters will cause a failure that the key is missing.

The other models that are part of the core transformers packages that use rotary embeding were all modified here: https://github.com/huggingface/transformers/pull/28837/files

However since this model uses the remote loading it will simply break upon updating transformers if you use load_in_4bit or pass in the bnb_config to turn quantized loading on.

It can be monkey patched like so:

from transformers.quantizers.quantizer_bnb_4bit import Bnb4BitHfQuantizer, get_module_from_name
from transformers.modeling_utils import PreTrainedModel
# CogVLM stores inv_freq in the state dictionary but it is not in models._parameters so it cannot be quantized
# was patched in transformers for other models here: https://github.com/huggingface/transformers/pull/28837/files but cog is not part of transformers
def patched_check_quantized_param(
        self, model: "PreTrainedModel", param_value: "torch.Tensor", param_name: str, state_dict: Dict[str, Any]
    ) -> bool:
    module, tensor_name = get_module_from_name(model, param_name)
    if ("inv_freq" == tensor_name):  # PATCH
        return False  # PATCH
    if isinstance(module._parameters[tensor_name], bnb.nn.Params4bit):  # KEY ERROR on inv_freq
        return True
    elif isinstance(module, bnb.nn.Linear4bit) and tensor_name == "bias":
        return True
    else:
        return False

Bnb4BitHfQuantizer.check_quantized_param = patched_check_quantized_param

I'm wondering if KEG would consider committing the modeling_cogvlm.py into the core transformers package? It would be more likely to be maintained there. The licenses appear to be compatible (Apache) but it is not my place to do so.

Sign up or log in to comment