How to apply the model for inferencing

#2
by ZeyuZhang - opened

To whom it may concern,
Hi, this is Zeyu, a PhD working on a similar project for data wrangling tasks. We came across this Jellyfish work and realize it super cool. Therefore, we would like to use it as a baseline to compare with our designed approach. However, it seems I am not able to use the model directly as how I can do with a regular Llama2. For instance, here is some snippets I wrote to use your model for inferencing:

tokenizer = AutoTokenizer.from_pretrained("NECOUDBFM/Jellyfish")
model = AutoModelForCausalLM.from_pretrained("NECOUDBFM/Jellyfish")
input = ''The input for an entity matching task''' #referred to the showcased prompt
inputs = tokenizer(input, return_tensors="pt")
generate_ids = model.generate(inputs.input_ids, max_length=30)
print(te_tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0])

However, the printed output makes no sense to me, while it just repeated what I forwarded. Moreover, it seems I am not able to feed the model input in batch style, while it continuously throws me dimensional incompatible error.

Looking forward to hearing from you.
Best,
Zeyu

NEC_OU Database Foundation Model org

Hello Zeyu,

Thank you for reaching out and expressing your interest in our model. We apologize for the difficulties you've encountered.

From the code you've shared, it appears that you're attempting to load Jellyfish directly via the Hugging Face API. Unfortunately, we don’t have direct control over how the model is handled during this process. Therefore, we strongly recommend manually downloading the model files before attempting to use Jellyfish for inference.

To assist you further, we are providing two scripts specifically designed for inference. One is tailored for the vLLM module (https://github.com/vllm-project) and the other for the Transformers module (https://huggingface.co/docs/transformers/index).
Note that vLLM has demonstrated significantly improved inference efficiency.

Code Snippet 1 for vLLM:

from vllm import LLM, SamplingParams

model = LLM(model={path to Jellyfish files, e.g. "/workspace/Jellyfish/"})
sampling_params = SamplingParams(
temperature={temperature, e.g. 0.35},
top_p={top_p, e.g. 0.9},
max_tokens={max new token, e.g. 1024},
stop=["### Instruction:"],
)

system_message = "You are an AI assistant that follows instruction extremely well. Help as much as you can."
user_message = """You are tasked with determining whether two records listed below are the same based on the information provided.
Carefully compare the {attribute 1}, {attribute 2}... for each record before making your decision.
Note: Missing values (N/A or "nan") should not be used as a basis for your decision.
Record A: [{attribute 1}: {attribute 1 value}, {attribute 2}: {attribute 2 value}, ...]
Record B: [{attribute 1}: {attribute 1 value}, {attribute 2}: {attribute 2 value}, ...]
Are record A and record B the same entity? Choose your answer from: [Yes, No].
"""

prompt = f"{system_message}\n\n### Instruction:\n\n{user_message}\n\n### Response:\n\n"
outputs = model.generate(prompt, sampling_params)
response = outputs[0].outputs[0].text.strip()
print(response)

Code Snippet 2 for Transformers:

from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig
import torch

if torch.cuda.is_available():
device = "cuda"
else:
device = "cpu"
model = AutoModelForCausalLM.from_pretrained(
{path_to_model},
torch_dtype=torch.float16,
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained({path_to_model})

prompt = f"{system_message}\n\n### Instruction:\n\n{user_message}\n\n### Response:\n\n"
inputs = tokenizer(prompt, return_tensors="pt")
input_ids = inputs["input_ids"].to(device)

generation_config = GenerationConfig(
do_samples=True,
temperature=0.35,
top_p=0.9,
)

with torch.no_grad():
generation_output = model.generate(
input_ids=input_ids,
generation_config=generation_config,
return_dict_in_generate=True,
output_scores=True,
max_new_tokens=1024,
pad_token_id=tokenizer.eos_token_id,
repetition_penalty=1.15,
)
output = generation_output[0]
response = tokenizer.decode(
output[:, input_ids.shape[-1] :][0], skip_special_tokens=True
).strip()

print(response)

We hope these scripts will resolve the issues you're facing. Your participation in testing Jellyfish is immensely valuable to us, and we encourage you to reach out again should you have any further questions or need additional assistance.

Best regards,

Haochen, NECOUDBFM

NEC_OU Database Foundation Model org

Additionally, we want to inform you that we will soon provide code examples for accessing the Jellyfish model directly using the Hugging Face API. These examples will be included in the model card. If you prefer this approach, we kindly ask for your patience as we prepare this documentation.

Best regards,

Haochen, NECOUDBFM

NEC_OU Database Foundation Model org

Please see: https://huggingface.co/NECOUDBFM/Jellyfish#using-transformers-and-torch-modules
for inference with the orginal huggingface transformers.

Hi Haochen and Yuyang,
Thank you so much for your kind help! Actually I just managed to download the source models from Huggingface Repo as you suggested before, I will test the model in my upcoming endeavors.

Thank you again.
Best,
Zeyu

HCZhang changed discussion status to closed

Sign up or log in to comment