johnsnowlabs/JSL-MedLlama-3-8B-v1.0

16 days ago

There isn't a chat template yet, the script proposed in Model card doesn't work:
"No chat template is defined for this tokenizer - using a default chat template that implements the ChatML format...."

Therefore it doesn't use the Llama chat template, and it doesn't work:

abideen

John Snow Labs org 16 days ago

Please use the following snippet for inference:

!pip install -qU transformers accelerate bitsandbytes

from transformers import AutoTokenizer
import transformers
import torch

model = "johnsnowlabs/JSL-MedLlama-3-8B-v1.0"

tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    model_kwargs={"torch_dtype":torch.bfloat16,"load_in_4bit":True}
)

question = ''###'Question: What is paracetamol? Explain briefly. ###Answer:'''
prompt = f"""{question}"""
outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])

abideen changed discussion status to closed 16 days ago