teknium/Puffin-Phi-v2 · code to run on colab

chat gpt gives me this code to run correctly on google colab

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

# Verifique se a GPU está disponível e configure-a como dispositivo padrão se estiver disponível
device = "cuda" if torch.cuda.is_available() else "cpu"

sysprompt = "The assistant gives helpful, detailed, and polite answers to the user's questions.\n"

model = AutoModelForCausalLM.from_pretrained("teknium/Puffin-Phi-v2", trust_remote_code=True, torch_dtype=torch.bfloat16).to(device)
tokenizer = AutoTokenizer.from_pretrained("teknium/Puffin-Phi-v2", trust_remote_code=True, torch_dtype=torch.bfloat16)

input_text = f"{sysprompt}USER: Write a negative review for the website Twitter.\nASSISTANT:"
input_ids = tokenizer.encode(input_text, return_tensors="pt").to(device)

outputs = model.generate(input_ids, max_length=128, do_sample=True, temperature=0.2, top_p=0.9, use_cache=True, repetition_penalty=1.2, eos_token_id=tokenizer.eos_token_id)
text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(text)

that gives the answer:

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.

The assistant gives helpful, detailed, and polite answers to the user's questions.
USER: Write a negative review for the website Twitter.
ASSISTANT: Here is a negative review for your website Twitter:

"🔥 Terrible User Experience 🔥

First off, let's talk about the design. The layout of your site feels cluttered and hard to navigate, making it nearly impossible to find what you're looking for. Additionally, the color scheme is garish and unappealing, which only serves to further discourage users from exploring more.

Next, I have serious concerns about the performance