how to increase response max token size

#99
by philgrey - opened

I've deployed this model to AWS sagemaker and now, I wanna use endpoint.
I'm getting response but its length is too short
following is my code for deployment.

hub = {
'HF_MODEL_ID':'mistralai/Mistral-7B-v0.1',
'SM_NUM_GPUS': json.dumps(1),
'MAX_TOTAL_TOKENS': json.dumps(4096)
}

create Hugging Face Model Class

huggingface_model = HuggingFaceModel(
image_uri=get_huggingface_llm_image_uri("huggingface",version="1.1.0"),
env=hub,
role=role,
)

deploy model to SageMaker Inference

predictor = huggingface_model.deploy(
initial_instance_count=1,
instance_type="ml.g5.4xlarge",
container_startup_health_check_timeout=300,
)

and this is code for response generation

llm = SagemakerEndpoint(
endpoint_name=endpoint,
region_name="eu-west-2",
model_kwargs={
"temperature": 0,
"maxTokens": 4096,
"numResults": 3
},
content_handler=content_handler,
)

llm.generate(["nice to meet you"])

Sign up or log in to comment