Text Generation
Transformers
Safetensors
llama
text-generation-inference
Inference Endpoints

GPU Memory Constraints for 01-ai/Yi-9B-200K Model

#3
by microcn - opened

What are the GPU memory requirements for loading the 01-ai/Yi-9B-200K model? I am currently facing an issue where loading the model with two RTX 4090 GPUs fails when using the following code:

model = AutoModelForCausalLM.from_pretrained(MODEL_DIR, torch_dtype="auto")
tokenizer = AutoTokenizer.from_pretrained(MODEL_DIR, use_fast=False)

if you lower max_position_embeddings in config.json to a lower value, it should load. Required VRAM will differ based on whether you have flash attention or not.

Add device_map=device_map when loading the model

Sign up or log in to comment