Deploy Quantized model on AWS Sagemaker
#4
by
sunnykusawa
- opened
I am using the Sagemaker script provided for the deployment on model_id = TheBloke/CodeLlama-13B-Instruct-GGUF
But i want to deploy the specific quantized model for exampl 'Q4_K_Medium'. How I can do that? In below sagemaker script there in no provision to mention about which specific quntized varient to deploy from this TheBloke/CodeLlama-13B-Instruct-GGUF.
Hub Model configuration. https://huggingface.co/models
hub = {
'HF_MODEL_ID':'TheBloke/CodeLlama-13B-Instruct-GGUF',
'SM_NUM_GPUS': json.dumps(4)
}
create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
image_uri=get_huggingface_llm_image_uri("huggingface",version="1.4.2"),
env=hub,
role=role,
)
deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
initial_instance_count=1,
instance_type="ml.g5.12xlarge",
container_startup_health_check_timeout=900,
)