Beginner Question: usage with AutoTokenizer and AutoModel
#2
by
antoninoLorenzo
- opened
It may be that I am a beginner and I don't have a great understanding of Tensor architecture, however I tried to use the model and I am unable to convert the output in a label.
I wrote the following class:
class JailbreakClassifier:
_model_name = "jackhhao/jailbreak-classifier"
def __init__(self):
self._tokenizer = AutoTokenizer.from_pretrained(self._model_name)
self._model = AutoModel.from_pretrained(self._model_name)
def predict(self, text: str):
"""Returns a label 'jailbreak' or 'benign'"""
# Convert input text into tensors
inputs = self._tokenizer(
text,
padding=True,
truncation=True,
return_tensors="pt"
)
# compute raw predictions
with torch.no_grad():
outputs = self._model(**inputs)
# post-processing ?
outputs
hasn't the classic "logits" attribute, I get that it is a BaseModelOutputWithPoolingAndCrossAttentions
.
Hello!
So the problem here seems to be that you're loading the pretrained model using the base AutoModel class, instead of the AutoModelForSequenceClassification class. This creates a BaseModelOutputWithPoolingAndCrossAttentions
output, like you mentioned, instead of the SequenceClassifierOutput
with the loss
and logits
properties.
You can just change it to this:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
self._tokenizer = AutoTokenizer.from_pretrained(self._model_name)
self._model = AutoModelForSequenceClassification.from_pretrained(self._model_name) # use the specific downstream classification model
And that should do it.
Potentially useful resources you can check out:
- The HuggingFace Transformers library documentation on model outputs: https://huggingface.co/docs/transformers/en/main_classes/output
- Tutorial for seq2seq classification using Transformers: https://huggingface.co/docs/transformers/en/tasks/sequence_classification