nferruz/ProtGPT2 · Use model for calculating sequence perplexity

Aug 16, 2022

Hi,

Thank you for sharing your awesome work. I wonder this model can be used for reliably estimating the designed sequences' perplexity like with its parental GPT2 model? If yes, can you please give me some hints how to do it? As shown in this thread, take average loss from the whole sequence seems suboptimal (https://huggingface.co/docs/transformers/perplexity)

Bests,
Ai

nferruz

Owner Aug 16, 2022

Hi avuhong!

Thanks for reaching out! Yes, you can compute perplexity for each sequence separately and in fact, I'd recommend so as a good way to evaluate them.

You can easily do that with the following HuggingFace package: https://huggingface.co/spaces/evaluate-measurement/perplexity:

from evaluate import load
perplexity = load("perplexity",  module_type= "measurement")
results = perplexity.compute(data=input_texts, model_id=' nferruz/ProtGPT2')

As you will see from the link, you can also pass several sequences to the computation with a list (copied from the link above):

perplexity = evaluate.load("perplexity", module_type="measurement")
input_texts = ["lorem ipsum", "Happy Birthday!", "Bienvenue"]
results = perplexity.compute(model_id='gpt2',
                             add_start_token=False,
                             data=input_texts)
print(list(results.keys()))
>>>['perplexities', 'mean_perplexity']
print(round(results["mean_perplexity"], 2))
>>>646.74
print(round(results["perplexities"][0], 2))
>>>32.25

I hope this helps let me know if there are further questions!

nferruz

Owner Aug 16, 2022

Ah! You can also find the code here: https://huggingface.co/spaces/evaluate-metric/perplexity/blob/main/perplexity.py

avuhong

Aug 16, 2022

That's perfect, thank you very much ;)

avuhong changed discussion status to closed Aug 16, 2022