using Gibberish-Detector

#7
by martyD - opened

Hi madhurjindal,
I tried to use your code to detect Gibberish, but I the output I received seemed strange and I didn't understand it. What I did was the following:

  • installed transformers, etc.
  • from transformers import AutoModelForSequenceClassification, AutoTokenizer
    model = AutoModelForSequenceClassification.from_pretrained("madhurjindal/autonlp-Gibberish-Detector-492513457")
    tokenizer = AutoTokenizer.from_pretrained("madhurjindal/autonlp-Gibberish-Detector-492513457")
  • inputs = tokenizer("I like apples.", return_tensors="pt")
    outputs = model(**inputs)
    outputs

The output is: SequenceClassifierOutput(loss=None, logits=tensor([[ 2.0615, 0.1996, -2.1773, -0.5643]], grad_fn=), hidden_states=None, attentions=None)

Now, if I understood correctly, tensor gives you the weight/number representing [clean, mild gibberish, word salad, noise] ? What I don't understand is the meaning of the positive and negative number plus its size. What is more, I wanted to ask if with this code one can identify a Gibberish word in a sentence, document, instead of getting some general numbers related to the full sentence?

Thank you in advance for you reply.
Kind regards,
M

Well, the output is the pre-softmax output (logits) - so the range is not fixed. Please use the softmax function at the top of the output to convert the output to the range (0, 1) type of probability, so select the highest probability class. Hopefully this helps.

madhurjindal changed discussion status to closed

Sign up or log in to comment