How to determine whether the final model output is good or bad?
#1
by
Jeremy1110
- opened
Hello, author. I would like to ask about the final output logits of your model. What range of values should I expect for it to be considered reasonable?
I tested it with a random tgt and also with a translation model's output. The first case produced a value of 0.64, while the second case resulted in 0.7.
Additionally, I am using a translation model, but it sometimes generates hallucinated. Would it be possible to filter such outputs using Quality Estimation?
Hello! This model is for sentence-level QE. Hence, it is better to use COMET.
As for which score is better, there is no specific score. Usually, we compare two systems, e.g. a baseline vs. a fine-tuned model.
ymoslem
changed discussion status to
closed