Reproducibility of results
#2
by
tprochenka
- opened
Hi, I've read your blog post, results seem to be quite promisting. I have 2 questions:
- On what machine have you run your tests (inference and fine-tunning)? I tried inference on aws g5.2xlarge (24GB GPU RAM) and it was still not enough so I needed to put model on cpu.
- Have you been able to get such good results in a consistent manner because mine look much worse. Maybe I'm missing sth. Could you please take a look?
from transformers import LlamaTokenizer, LlamaForCausalLM
from peft import PeftModel
# import bitsandbytes as bnb
device = "cpu"
base = "decapoda-research/llama-7b-hf"
finetuned = "mmosiolek/polpaca-lora-7b"
tokenizer = LlamaTokenizer.from_pretrained(base)
tokenizer.pad_token_id = 0
tokenizer.padding_side = "left"
model = LlamaForCausalLM.from_pretrained(base)
model = PeftModel.from_pretrained(model, finetuned).to(device)
from transformers import GenerationConfig
import torch
config = GenerationConfig(
temperature=0.1,
top_p=0.5,
top_k=40,
num_beams=4,
max_new_tokens=128,
repetition_penalty=1.2
)
def run(instruction, model, tokenizer, config, device):
encodings = tokenizer(instruction, padding=True, return_tensors="pt").to(device)
generated_ids = model.generate(
**encodings,
generation_config=config,
)
decoded = tokenizer.batch_decode(generated_ids)
del encodings, generated_ids
torch.cuda.empty_cache()
return decoded[0].split("\n")[-1]
run("Wymyśl kilka zapytań w google na temat kodowania.", model, tokenizer, config, device)
Thanks
Tomek
Hey,
here are the answers:
- I run the training + evaluation on RTX 4090.
- Please don't expect consistent results from this kind of toy experiment. It's more to play with and understand the limitations of the approach I've taken. However, what I'd suggest in your case is to adjust the temperature - try increasing it. It should give more meaningful results than simply repeating the input.
I hope it helps! :)