Can't reproduce this model prediction on sagemaker

#1
by longquan - opened
Sparticle org

sagemaker configuration information.
'HF_MODEL_ID':'Sparticle/llama-2-7b-chat-japanese-lora',
'HF_TASK':'question-answering'
Configured as a QA quiz

from sagemaker.huggingface import HuggingFaceModel

# Hub Model configuration. https://huggingface.co/models
hub = {
  'HF_MODEL_ID':'Sparticle/llama-2-7b-chat-japanese-lora', # model_id from hf.co/models
  'HF_TASK':'question-answering' # NLP task you want to use for predictions
}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
   env=hub,
   role=role, # iam role with permissions to create an Endpoint
   transformers_version="4.26", # transformers version used
   pytorch_version="1.13", # pytorch version used
   py_version="py39", # python version of the DLC
)

Error information

ModelError                                Traceback (most recent call last)
Cell In[25], line 5
      2 data = {"inputs": {"question": "日本の首都はどこですか?","context": "日本の首都は東京です。"}}
      4 # request
----> 5 predictor.predict(data)

File ~/SageMaker/.cs/conda/envs/codeserver_py39/lib/python3.9/site-packages/sagemaker/base_predictor.py:185, in Predictor.predict(self, data, initial_args, target_model, target_variant, inference_id, custom_attributes)
    138 """Return the inference from the specified endpoint.
    139 
    140 Args:
   (...)
    174         as is.
    175 """
    177 request_args = self._create_request_args(
    178     data,
    179     initial_args,
   (...)
    183     custom_attributes,
    184 )
--> 185 response = self.sagemaker_session.sagemaker_runtime_client.invoke_endpoint(**request_args)
    186 return self._handle_response(response)

File ~/SageMaker/.cs/conda/envs/codeserver_py39/lib/python3.9/site-packages/botocore/client.py:535, in ClientCreator._create_api_method.._api_call(self, *args, **kwargs)
    531     raise TypeError(
...
  "code": 400,
  "type": "InternalServerException",
  "message": "/.sagemaker/mms/models/Sparticle__llama-2-7b-chat-japanese-lora does not appear to have a file named config.json. Checkout \u0027https://huggingface.co//.sagemaker/mms/models/Sparticle__llama-2-7b-chat-japanese-lora/None\u0027 for available files."
}

It seems that the path is incorrect and the config.json file cannot be found
Any help with this would be greatly appreciated!

Sparticle org

Hi,
This is a LoRA adapter file, not a whole model, thus it has an adapter_config.json instead of config.json.
It must be used with Llama-2-7b-chat-hf model by Meta and cannot be used alone. Please refer to (https://github.com/tloen/alpaca-lora) to see how to run this model.

Thanks for the quick reply, I have found out that LoRA adapter file is just a small auxiliary model that needs to be used in conjunction with Llama-2-7b-chat-hf, I will refer to the information you provided https://github.com/tloen/alpaca-lora

longquan changed discussion status to closed
longquan changed discussion status to open

Thank you very much for your help, I've reproduced the model generation
I want to train the lora model by taking the customized Japanese dataset finetune way, merging the customized dataset with the current [izumi-lab/llm-japanese-dataset] dataset and then train the lora model by the method provided by [alpaca-lora], would you think that is correct, and thank you very much if you can provide guidance!

Sparticle org

Hi,
It is hard to predict if a finetuned model would perform well or not before you finish training it, and it is also hard to tell if a training paradigm is 'normal' or not without actually doing it by oneself. My advice is to be careful about the compatibility of the liscences of datasets when merging them. Good luck with your endeavours.

Sparticle org

Thanks for your suggestion,
Could you provide the configuration of the machine you are training[Sparticle/llama-2-13b-chat-japanese-lora] on (GPU model and number and video memory size), hyperparameters for training, and spend training time.

Sparticle org

Hi,
Could you provide the configuration of the machine you are trained[Sparticle/llama-2-13b-chat-japanese-lora] on (GPU model and number and video memory size), config hyperparameters for training, and spend training time.
We are training LoRA models like [Sparticle/llama-2-13b-chat-japanese-lora] and would like to get information about your trained.

longquan changed discussion status to closed

Sign up or log in to comment