Unable to produce the same Eval and Test Results
Dataset used : https://huggingface.co/datasets/conll2003
Evaluation Metric : load_metric("seqeval")
**Results Obtained : **
{'eval_loss': 2.3160810470581055,
'eval_precision': 0.6153949670300094,
'eval_recall': 0.7696061932009425,
'eval_f1': 0.6839153518283106,
'eval_accuracy': 0.9621769588508859,
'eval_runtime': 556.8392,
'eval_samples_per_second': 5.838,
'eval_steps_per_second': 0.731}
Ner label alignment code : Code from : https://huggingface.co/course/chapter7/2
def align_labels_with_tokens(labels, word_ids):
new_labels = []
current_word = None
for word_id in word_ids:
if word_id != current_word:
# Start of a new word!
current_word = word_id
label = -100 if word_id is None else labels[word_id]
new_labels.append(label)
elif word_id is None:
# Special token
new_labels.append(-100)
else:
# Same word as previous token
label = labels[word_id]
# If the label is B-XXX we change it to I-XXX
if label % 2 == 1:
label += 1
new_labels.append(label)
return new_labels
Compute Metric
def compute_metrics(eval_preds):
logits, labels = eval_preds
predictions = np.argmax(logits, axis=-1)
true_labels = [[label_names[l] for l in label if l != -100] for label in labels]
true_predictions = [
[id2labels[str(p)] for (p, l) in zip(prediction, label) if l != -100]
for prediction, label in zip(predictions, labels)
]
all_metrics = metric.compute(predictions=true_predictions, references=true_labels)
return {
"precision": all_metrics["overall_precision"],
"recall": all_metrics["overall_recall"],
"f1": all_metrics["overall_f1"],
"accuracy": all_metrics["overall_accuracy"],
}
Note : Using id2labels from ur models. Please comment on this
Was there any update on this? Did you manage to reproduce the results?
no