Consistency check failed - model-00019-of-00019.safetensors

#118
by br1-pist - opened

Python modules:
transformers==4.37.1
peft==0.7.1
accelerate==0.26.1
bitsandbytes==0.42.0

Hello. I'm trying to fine-tune mistralai/Mixtral-8x7B-Instruct-v0.1 on Amazon SageMaker. It was working properly until today, when I started receiving the following error:

AlgorithmError: OSError('Consistency check failed: file should be of size 4221679088 but has size 3663094841 (model-00019-of-00019.safetensors).\nWe are sorry for the inconvenience. Please retry download and pass `force_download=True, resume_download=False` as argument.\nIf the issue persists, please let us know by opening an issue on https://github.com/huggingface/huggingface_hub.'), exit code: 1

I'm already using the force_download=True and resume_downlad=False as indicated.

bnb_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_use_double_quant=True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_compute_dtype=torch.bfloat16
    )

    model = AutoModelForCausalLM.from_pretrained(
       "mistralai/Mixtral-8x7B-Instruct-v0.1",
        trust_remote_code=True,
        force_download=True,
        resume_download=False,
        quantization_config=bnb_config,
        device_map="auto")

I've noticed that when the shards are loaded, it appears the following warning UserWarning: Not enough free disk space to download the file. The expected file size is: 4221.68 MB. The target location /root/.cache/huggingface/hub only has 3747.39 MB free disk space.. I have more than 400 GB of free space, so it's not even possible that there is missing space on the device.

Can you please help me solve the problem? Thank you

br1-pist changed discussion title from Consistency check failed to Consistency check failed - model-00019-of-00019.safetensors

Sign up or log in to comment