Is this fp16 or mixed fp16?

#1
by sslx - opened

Thanks for the conversion.
Just wondering if it's mixed fp16 or only fp16?
Would that make difference when fine tuning?

It should be only fp16. It was converted from @ehartford 's original fp32 repo with this code:

import torch
from transformers import AutoTokenizer, LlamaForCausalLM
import argparse
import os

import os

parser = argparse.ArgumentParser(description='Convert fp32 model to fp16')
parser.add_argument('model_dir', type=str, help='fp32 model folder')
parser.add_argument('output_dir', type=str, help='fp16 output folder')
parser.add_argument('--device', type=str, default="cuda:0", help='device')

args = parser.parse_args()

model_dir =  args.model_dir
output_dir = args.output_dir

model = LlamaForCausalLM.from_pretrained(
            model_dir,
            load_in_8bit=False,
            torch_dtype=torch.float16,
            low_cpu_mem_usage=True,
            device_map='auto'
            )

model.use_cache = True

LlamaForCausalLM.save_pretrained(
            model, output_dir, torch_dtype=torch.float16
            )

Thanks so much for the info!
Original LLaMA 13b seems larger by 2.4GB, so the base LLMA must be in mixed fp16.
It might make difference if you want to finetune this.
Thanks for the script. Probably I can modify to get mixed fp16.

Oh, huh. I didn't notice that.

Is that going to be a problem? I can try saving it again, see if I can get it to save identically to how llama 13b saves.

btw save can just be LlamaForCausalLM.save_pretrained(model, output_dir) and when you load the model use torch_dtype=torch.bfloat16 vs torch_dtype=torch.float16 to save in mixed fp16.

Original LLaMA 13b seems larger by 2.4GB, so the base LLMA must be in mixed fp16.
It might make difference if you want to finetune this.
Thanks for the script. Probably I can modify to get mixed fp16.

Apologies! It turns out the model was truncated, which is why it was 3GB short. Something went wrong in the fp32 -> fp16 script I used. I'm still debugging that so I can avoid it in future.

I have just re-uploaded the model, and it is now the correct size for fp16. So please re-download it.

It wasn't anything to do with mixed fp16 I believe. Just simply that some tensors were missing!

Thanks again!
Is 4-bit also impacted by this?
Should I redownload 4-bit as well?

No, 4bit is fine. It was only the HF model that had issues

Great, thanks!

sslx changed discussion status to closed

Sign up or log in to comment