madhurjindal/autonlp-Gibberish-Detector-492513457

Aug 21, 2023

Hi madhurjindal,
I tried to use your code to detect Gibberish, but I the output I received seemed strange and I didn't understand it. What I did was the following:

installed transformers, etc.
from transformers import AutoModelForSequenceClassification, AutoTokenizer
model = AutoModelForSequenceClassification.from_pretrained("madhurjindal/autonlp-Gibberish-Detector-492513457")
tokenizer = AutoTokenizer.from_pretrained("madhurjindal/autonlp-Gibberish-Detector-492513457")
inputs = tokenizer("I like apples.", return_tensors="pt")
outputs = model(**inputs)
outputs

The output is: SequenceClassifierOutput(loss=None, logits=tensor([[ 2.0615, 0.1996, -2.1773, -0.5643]], grad_fn=), hidden_states=None, attentions=None)

Now, if I understood correctly, tensor gives you the weight/number representing [clean, mild gibberish, word salad, noise] ? What I don't understand is the meaning of the positive and negative number plus its size. What is more, I wanted to ask if with this code one can identify a Gibberish word in a sentence, document, instead of getting some general numbers related to the full sentence?

Thank you in advance for you reply.
Kind regards,
M

madhurjindal

Owner Aug 22, 2023

Well, the output is the pre-softmax output (logits) - so the range is not fixed. Please use the softmax function at the top of the output to convert the output to the range (0, 1) type of probability, so select the highest probability class. Hopefully this helps.

madhurjindal changed discussion status to closed Aug 30, 2023

madhurjindal
/

autonlp-Gibberish-Detector-492513457

using Gibberish-Detector