results are weird - can't recognize standard name like 'michael' ?

#5
by infoseek - opened

something must be wrong with my configuration i'm guessing? how could it completely miss a standard name ?

[
{'word': 'my', 'entity_group': 'O', 'score': 0.9950373768806458},
{'word': 'name', 'entity_group': 'O', 'score': 0.9994168281555176},
{'word': 'is', 'entity_group': 'O', 'score': 0.9994277358055115},
{'word': 'michael', 'entity_group': 'O', 'score': 0.9982740879058838}
]

code is pretty standard:

tokenizer = AutoTokenizer.from_pretrained(g_local_model_id_path)
model = AutoModelForTokenClassification.from_pretrained(g_local_model_id_path)
model.to("cpu")

inputs = tokenizer(
    text, add_special_tokens=False, return_tensors="pt"
)

with torch.no_grad():
    logits = model(**inputs).logits

output_json = []

predictions = torch.argmax(logits, dim=2)
predicted_token_class = [model.config.id2label[t.item()] for t in predictions[0]]

ok well...after removing: add_special_tokens=False . all works as expected...

infoseek changed discussion status to closed

Sign up or log in to comment