Size of My vicuna model is twice of yours. Why is that?

#7
by David003 - opened

I finetune lamma-7b and get my vicuna model. I found that the vicuna model folder is twice the size of original lamma-7b.

Original lamma-7b is about 13G:
drwxrwxr-x 8 ubuntu ubuntu 4096 Jun 11 09:32 .git/
-rw-rw-r-- 1 ubuntu ubuntu 1546 Jun 11 09:31 .gitattributes
-rw-rw-r-- 1 ubuntu ubuntu 177 Jun 11 09:31 README.md
-rw-rw-r-- 1 ubuntu ubuntu 507 Jun 11 09:31 config.json
-rw-rw-r-- 1 ubuntu ubuntu 137 Jun 11 09:31 generation_config.json
-rw-rw-r-- 1 ubuntu ubuntu 9976634558 Jun 11 09:31 pytorch_model-00001-of-00002.bin
-rw-rw-r-- 1 ubuntu ubuntu 3500315539 Jun 11 09:31 pytorch_model-00002-of-00002.bin
-rw-rw-r-- 1 ubuntu ubuntu 26788 Jun 11 09:31 pytorch_model.bin.index.json
-rw-rw-r-- 1 ubuntu ubuntu 411 Jun 11 09:31 special_tokens_map.json
-rw-rw-r-- 1 ubuntu ubuntu 499723 Jun 11 09:31 tokenizer.model
-rw-rw-r-- 1 ubuntu ubuntu 727 Jun 11 09:31 tokenizer_config.json

While the vicuna model is about 28G:

drwxrwxr-x 2 ubuntu ubuntu 4096 Jun 7 04:38 ./
drwxrwxr-x 22 ubuntu ubuntu 4096 Jun 11 10:43 ../
-rw-rw-r-- 1 ubuntu ubuntu 548 May 18 03:31 config.json
-rw-rw-r-- 1 ubuntu ubuntu 132 May 18 03:31 generation_config.json
-rw-rw-r-- 1 ubuntu ubuntu 9877989586 May 18 03:31 pytorch_model-00001-of-00003.bin
-rw-rw-r-- 1 ubuntu ubuntu 9894801014 May 18 03:31 pytorch_model-00002-of-00003.bin
-rw-rw-r-- 1 ubuntu ubuntu 7180990649 May 18 03:31 pytorch_model-00003-of-00003.bin
-rw-rw-r-- 1 ubuntu ubuntu 26788 May 18 03:31 pytorch_model.bin.index.json
-rw-rw-r-- 1 ubuntu ubuntu 435 May 18 03:31 special_tokens_map.json
-rw-rw-r-- 1 ubuntu ubuntu 499723 May 18 03:31 tokenizer.model
-rw-rw-r-- 1 ubuntu ubuntu 727 May 18 03:31 tokenizer_config.json
-rw-rw-r-- 1 ubuntu ubuntu 4830 May 18 03:31 trainer_state.json
-rw-rw-r-- 1 ubuntu ubuntu 3771 May 18 03:31 training_args.bin

Is my situation Correct or Not? Thanks.

Because yours is in float32. It's not a big problem, except it uses more disk space. If you want you can use this script to convert it to float16:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
import argparse

parser = argparse.ArgumentParser(description='Convert fp32 model to fp16')
parser.add_argument('model_dir', type=str, help='fp32 model folder')
parser.add_argument('output_dir', type=str, help='fp16 output folder')
parser.add_argument('--device', type=str, default="cuda:0", help='device')

args = parser.parse_args()

model_dir =  args.model_dir
output_dir = args.output_dir

model = AutoModelForCausalLM.from_pretrained(
            model_dir,
            torch_dtype=torch.float32,
            low_cpu_mem_usage=True,
            )

model = model.half()

model.save_pretrained(
            output_dir, torch_dtype=torch.float16
            )

Thank you very much!

Trying to quantize this model to 4-bits:

On ubuntu and got killed response after running the command line as follows " python llama.py weights/vicuna-7b-delta-v1.1 c4 --wbits 4 --true-sequential --act-order --groupsize 128 --save llama7b-4bit-128g.pt
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]
Killed"
My gpu is RTX A2000 which has 8GB of memory unless I need to reduce the model to 16 bits so it would fit on gpu as it seems to be around 14GB and 32bits
I tried running the 32-16 bit conversion both as gpu and cpu and still got killed by the system.

Sign up or log in to comment