how to increase response max token size
I've deployed this model to AWS sagemaker and now, I wanna use endpoint.
I'm getting response but its length is too short
following is my code for deployment.
hub = {
'HF_MODEL_ID':'mistralai/Mistral-7B-v0.1',
'SM_NUM_GPUS': json.dumps(1),
'MAX_TOTAL_TOKENS': json.dumps(4096)
}
create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
image_uri=get_huggingface_llm_image_uri("huggingface",version="1.1.0"),
env=hub,
role=role,
)
deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
initial_instance_count=1,
instance_type="ml.g5.4xlarge",
container_startup_health_check_timeout=300,
)
and this is code for response generation
llm = SagemakerEndpoint(
endpoint_name=endpoint,
region_name="eu-west-2",
model_kwargs={
"temperature": 0,
"maxTokens": 4096,
"numResults": 3
},
content_handler=content_handler,
)
llm.generate(["nice to meet you"])