How to determine whether the final model output is good or bad?

#1
by Jeremy1110 - opened

Hello, author. I would like to ask about the final output logits of your model. What range of values should I expect for it to be considered reasonable?

I tested it with a random tgt and also with a translation model's output. The first case produced a value of 0.64, while the second case resulted in 0.7.

Additionally, I am using a translation model, but it sometimes generates hallucinated. Would it be possible to filter such outputs using Quality Estimation?

Hello! This model is for sentence-level QE. Hence, it is better to use COMET.
As for which score is better, there is no specific score. Usually, we compare two systems, e.g. a baseline vs. a fine-tuned model.

ymoslem changed discussion status to closed
Your need to confirm your account before you can post a new comment.

Sign up or log in to comment