google/t5-3b-ssm-nq · How to obtain 'score' outputs?

Jun 10, 2022

•

edited Jun 10, 2022

I wanted to get scores for each question I asked. When I followed the suggested method here https://huggingface.co/docs/transformers/v4.19.3/en/internal/generation_utils#transformers.generation_utils.GreedySearchEncoderDecoderOutput.scores

I get this error:

/usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils_fast.py in _decode(self, token_ids, skip_special_tokens, clean_up_tokenization_spaces, **kwargs)
    545         if isinstance(token_ids, int):
    546             token_ids = [token_ids]
--> 547         text = self._tokenizer.decode(token_ids, skip_special_tokens=skip_special_tokens)
    548 
    549         if clean_up_tokenization_spaces:

TypeError: Can't convert {'sequences': [[0, 25439, 1]], 'scores': [[[-84.4974365234375, -35.67204284667969, -44.91609573364258, -27.592676162719727, -38.73073196411133, -37.58429718017578, -32.03908920288086, -41.34073257446289, -37.01685333251953, -38.45159149169922, -37.466190338134766, -32.51167297363281, -35.718780517578125, ...

As I understand, scores should be an integer. But the decoder can't decode if it's not an integer value.
Any advice?

patrickvonplaten

Jun 10, 2022

Hey @EnesDS , I think you simply need to replace token_ids simply by token_ids.sequencesin self._tokenizer.decode(...) - could you try this maybe?

EnesDS

Jun 17, 2022

•

edited Jun 17, 2022

Thank you, Patrick!
I tried this, but it still didn't work.
Here is the part where I made the change from the file transformers/token_utils_fast.py:

Here is the output after the change of the issue:

This is such a great abstractive QA model! To be able to obtain the confidence score would open a door to a great ways of usage.