Can't reproduce this model prediction on sagemaker

by longquan - opened Aug 16, 2023

Aug 16, 2023

sagemaker configuration information.
'HF_MODEL_ID':'Sparticle/llama-2-7b-chat-japanese-lora',
'HF_TASK':'question-answering'
Configured as a QA quiz

from sagemaker.huggingface import HuggingFaceModel

# Hub Model configuration. https://huggingface.co/models
hub = {
  'HF_MODEL_ID':'Sparticle/llama-2-7b-chat-japanese-lora', # model_id from hf.co/models
  'HF_TASK':'question-answering' # NLP task you want to use for predictions
}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
   env=hub,
   role=role, # iam role with permissions to create an Endpoint
   transformers_version="4.26", # transformers version used
   pytorch_version="1.13", # pytorch version used
   py_version="py39", # python version of the DLC
)

Error information

ModelError                                Traceback (most recent call last)
Cell In[25], line 5
      2 data = {"inputs": {"question": "日本の首都はどこですか?","context": "日本の首都は東京です。"}}
      4 # request
----> 5 predictor.predict(data)

File ~/SageMaker/.cs/conda/envs/codeserver_py39/lib/python3.9/site-packages/sagemaker/base_predictor.py:185, in Predictor.predict(self, data, initial_args, target_model, target_variant, inference_id, custom_attributes)
    138 """Return the inference from the specified endpoint.
    139 
    140 Args:
   (...)
    174         as is.
    175 """
    177 request_args = self._create_request_args(
    178     data,
    179     initial_args,
   (...)
    183     custom_attributes,
    184 )
--> 185 response = self.sagemaker_session.sagemaker_runtime_client.invoke_endpoint(**request_args)
    186 return self._handle_response(response)

File ~/SageMaker/.cs/conda/envs/codeserver_py39/lib/python3.9/site-packages/botocore/client.py:535, in ClientCreator._create_api_method.._api_call(self, *args, **kwargs)
    531     raise TypeError(
...
  "code": 400,
  "type": "InternalServerException",
  "message": "/.sagemaker/mms/models/Sparticle__llama-2-7b-chat-japanese-lora does not appear to have a file named config.json. Checkout \u0027https://huggingface.co//.sagemaker/mms/models/Sparticle__llama-2-7b-chat-japanese-lora/None\u0027 for available files."
}

It seems that the path is incorrect and the config.json file cannot be found
Any help with this would be greatly appreciated!

zhaozitian

Aug 16, 2023

Hi,
This is a LoRA adapter file, not a whole model, thus it has an adapter_config.json instead of config.json.
It must be used with Llama-2-7b-chat-hf model by Meta and cannot be used alone. Please refer to (https://github.com/tloen/alpaca-lora) to see how to run this model.

longquan

Aug 16, 2023

•

edited Aug 16, 2023

Thanks for the quick reply, I have found out that LoRA adapter file is just a small auxiliary model that needs to be used in conjunction with Llama-2-7b-chat-hf, I will refer to the information you provided https://github.com/tloen/alpaca-lora

longquan changed discussion status to closed Aug 16, 2023

longquan changed discussion status to open Aug 16, 2023

longquan

Aug 17, 2023

•

edited Aug 17, 2023

Thank you very much for your help, I've reproduced the model generation
I want to train the lora model by taking the customized Japanese dataset finetune way, merging the customized dataset with the current [izumi-lab/llm-japanese-dataset] dataset and then train the lora model by the method provided by [alpaca-lora], would you think that is correct, and thank you very much if you can provide guidance!

zhaozitian

Aug 17, 2023

Hi,
It is hard to predict if a finetuned model would perform well or not before you finish training it, and it is also hard to tell if a training paradigm is 'normal' or not without actually doing it by oneself. My advice is to be careful about the compatibility of the liscences of datasets when merging them. Good luck with your endeavours.

longquan

Aug 17, 2023

Thanks for your suggestion,
Could you provide the configuration of the machine you are training[Sparticle/llama-2-13b-chat-japanese-lora] on (GPU model and number and video memory size), hyperparameters for training, and spend training time.

longquan

Aug 19, 2023

Hi,
Could you provide the configuration of the machine you are trained[Sparticle/llama-2-13b-chat-japanese-lora] on (GPU model and number and video memory size), config hyperparameters for training, and spend training time.
We are training LoRA models like [Sparticle/llama-2-13b-chat-japanese-lora] and would like to get information about your trained.

longquan changed discussion status to closed Aug 22, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment