Deploy Quantized model on AWS Sagemaker

#4
by sunnykusawa - opened

I am using the Sagemaker script provided for the deployment on model_id = TheBloke/CodeLlama-13B-Instruct-GGUF

But i want to deploy the specific quantized model for exampl 'Q4_K_Medium'. How I can do that? In below sagemaker script there in no provision to mention about which specific quntized varient to deploy from this TheBloke/CodeLlama-13B-Instruct-GGUF.

Hub Model configuration. https://huggingface.co/models

hub = {
'HF_MODEL_ID':'TheBloke/CodeLlama-13B-Instruct-GGUF',
'SM_NUM_GPUS': json.dumps(4)
}

create Hugging Face Model Class

huggingface_model = HuggingFaceModel(
image_uri=get_huggingface_llm_image_uri("huggingface",version="1.4.2"),
env=hub,
role=role,
)

deploy model to SageMaker Inference

predictor = huggingface_model.deploy(
initial_instance_count=1,
instance_type="ml.g5.12xlarge",
container_startup_health_check_timeout=900,
)

Sign up or log in to comment