Different hypothesis format/text

#16
by marcj - opened

Just FYI, the template (hypothesis) used in the manual PyTorch section is different to the template used in explanation.

In the README is

we could construct a hypothesis of "This text is about politics.".

but in pipeline the following template is used

"This example is {}."

which seems to be the correct one since this yields good results.

So make sure to not use This text is about {}. or anything else. Using the correct hypothesis format is important to get consistent/good predictions.

Example, if you slightly change (because you think you are smarter) the hypothesis text from "This example is {}." to "This example is about {}." then the results totally crash:

text = "it's quite cheap, but the quality is not compromised – love the colors and the smooth application"
topic = "price"

# this is the correct one and used per default
classifier(text, topic, hypothesis_template="This example is {}.")
// =>  'scores': [0.9757905602455139]}

# wrong from the README
classifier(text, topic, hypothesis_template="This text is about {}.")
// => 'scores': [0.2925598919391632]}

# wrong, too
classifier(text, topic, hypothesis_template="This example is about {}.")
// => 'scores': [0.3993900716304779]}
marcj changed discussion title from pipeline vs manual PyTorch results to pipeline vs manual PyTorch results, due to different hypothesis text
marcj changed discussion status to closed
marcj changed discussion status to open
marcj changed discussion title from pipeline vs manual PyTorch results, due to different hypothesis text to Different hypothesis format/text

Sign up or log in to comment