Why did Model size increase when applying PoSE?

by sanjeev-bhandari01 - opened May 20

Discussion

sanjeev-bhandari01

May 20

I notice the increase in model size of LLaMA 2. What is reason behind it.

What part of architecture of LLaMA is change which could increase model size by large margin?

dwzhu

Owner May 20

Hi @sanjeev-bhandari01 , thanks for your attention in this work. However, I don't think PoSE will increase model size. It just changes the position ids and rope_base during continual pre-training phase. In this repo, pytorch_model-00001/2/3-of-00003.bin add up to approximately 28G, which is very reasonable for a 7B model, as each parameter takes 4 bytes when torch_dtype in the config file is set to float32. Looking forward to your reply :-)

sanjeev-bhandari01

May 20

Hi @dwzhu , understood. So all the model parameters are in float32.

I'm a bit unsure about the process. Should I infer with this model directly loaded from AutoModelForCausalLM and perform the usual inference, or should I first modify the config to set the context length to 16k for inference?

To explore this, I attempted to load the model in Colab(free version) using fp4 quantization and performed the usual inference without modifying the config. However, I encountered a CUDA out-of-memory error when trying to infer the context of 6300 tokens.

dwzhu

Owner May 20

Hi @sanjeev-bhandari01 , since the HF implementation of rope scaling is slightly different now compared with when this work is done, I think directly load from AutoModelForCausalLM will not work. Maybe you can find some examples of testing this model here. Basically, it uses pose_modeling_llama.py to define model behaviors, which have integrated xformers to avoid OOM in self-attention module.

sanjeev-bhandari01

May 20

Ok thanks a lot @dwzhu , I will look into it.

sanjeev-bhandari01 changed discussion status to closed May 20

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment