Can't reproduce this model prediction on sagemaker
sagemaker configuration information.
'HF_MODEL_ID':'Sparticle/llama-2-7b-chat-japanese-lora',
'HF_TASK':'question-answering'
Configured as a QA quiz
from sagemaker.huggingface import HuggingFaceModel
# Hub Model configuration. https://huggingface.co/models
hub = {
'HF_MODEL_ID':'Sparticle/llama-2-7b-chat-japanese-lora', # model_id from hf.co/models
'HF_TASK':'question-answering' # NLP task you want to use for predictions
}
# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
env=hub,
role=role, # iam role with permissions to create an Endpoint
transformers_version="4.26", # transformers version used
pytorch_version="1.13", # pytorch version used
py_version="py39", # python version of the DLC
)
Error information
ModelError Traceback (most recent call last)
Cell In[25], line 5
2 data = {"inputs": {"question": "日本の首都はどこですか?","context": "日本の首都は東京です。"}}
4 # request
----> 5 predictor.predict(data)
File ~/SageMaker/.cs/conda/envs/codeserver_py39/lib/python3.9/site-packages/sagemaker/base_predictor.py:185, in Predictor.predict(self, data, initial_args, target_model, target_variant, inference_id, custom_attributes)
138 """Return the inference from the specified endpoint.
139
140 Args:
(...)
174 as is.
175 """
177 request_args = self._create_request_args(
178 data,
179 initial_args,
(...)
183 custom_attributes,
184 )
--> 185 response = self.sagemaker_session.sagemaker_runtime_client.invoke_endpoint(**request_args)
186 return self._handle_response(response)
File ~/SageMaker/.cs/conda/envs/codeserver_py39/lib/python3.9/site-packages/botocore/client.py:535, in ClientCreator._create_api_method.._api_call(self, *args, **kwargs)
531 raise TypeError(
...
"code": 400,
"type": "InternalServerException",
"message": "/.sagemaker/mms/models/Sparticle__llama-2-7b-chat-japanese-lora does not appear to have a file named config.json. Checkout \u0027https://huggingface.co//.sagemaker/mms/models/Sparticle__llama-2-7b-chat-japanese-lora/None\u0027 for available files."
}
It seems that the path is incorrect and the config.json file cannot be found
Any help with this would be greatly appreciated!
Hi,
This is a LoRA adapter file, not a whole model, thus it has an adapter_config.json instead of config.json.
It must be used with Llama-2-7b-chat-hf model by Meta and cannot be used alone. Please refer to (https://github.com/tloen/alpaca-lora) to see how to run this model.
Thanks for the quick reply, I have found out that LoRA adapter file is just a small auxiliary model that needs to be used in conjunction with Llama-2-7b-chat-hf, I will refer to the information you provided https://github.com/tloen/alpaca-lora
Thank you very much for your help, I've reproduced the model generation
I want to train the lora model by taking the customized Japanese dataset finetune way, merging the customized dataset with the current [izumi-lab/llm-japanese-dataset] dataset and then train the lora model by the method provided by [alpaca-lora], would you think that is correct, and thank you very much if you can provide guidance!
Hi,
It is hard to predict if a finetuned model would perform well or not before you finish training it, and it is also hard to tell if a training paradigm is 'normal' or not without actually doing it by oneself. My advice is to be careful about the compatibility of the liscences of datasets when merging them. Good luck with your endeavours.
Thanks for your suggestion,
Could you provide the configuration of the machine you are training[Sparticle/llama-2-13b-chat-japanese-lora] on (GPU model and number and video memory size), hyperparameters for training, and spend training time.
Hi,
Could you provide the configuration of the machine you are trained[Sparticle/llama-2-13b-chat-japanese-lora] on (GPU model and number and video memory size), config hyperparameters for training, and spend training time.
We are training LoRA models like [Sparticle/llama-2-13b-chat-japanese-lora] and would like to get information about your trained.