xlmr-base-ro-re: Multilingual (English + Romanian) encoder.
Dragoș Mitruț Vasile · Elena-Simona Apostol · Stefan-Adrian Toma · Adrian Paschke · Ciprian-Octavian Truică
Fine-tuned xlm-roberta-base (278M) for Relation Classification on a Romanian translation of SemEval-2010 Task 8. Multilingual (English + Romanian) encoder.
Code: github.com/DS4AI-UPB/crosslingual-romanian-re.
Results (macro F1-Score, SemEval-2010 Task 8 test set)
| Language | macro F1-Score |
|---|---|
| English | .853 |
| Romanian | .822 |
How it works
The model takes a sentence with two entities marked by <e1> and <e2>. Before tokenization, these tags are mapped to four special tokens ([E1] [/E1] [E2] [/E2]) that were added to the vocabulary during fine-tuning. The classifier head predicts one of 19 directional labels (e.g. Cause-Effect(e1,e2) vs Cause-Effect(e2,e1)), which collapse to the 10 coarse SemEval relations.
Usage
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
RELATIONS = [
"Cause-Effect(e1,e2)", "Cause-Effect(e2,e1)",
"Instrument-Agency(e1,e2)", "Instrument-Agency(e2,e1)",
"Product-Producer(e1,e2)", "Product-Producer(e2,e1)",
"Content-Container(e1,e2)", "Content-Container(e2,e1)",
"Entity-Origin(e1,e2)", "Entity-Origin(e2,e1)",
"Entity-Destination(e1,e2)", "Entity-Destination(e2,e1)",
"Component-Whole(e1,e2)", "Component-Whole(e2,e1)",
"Member-Collection(e1,e2)", "Member-Collection(e2,e1)",
"Message-Topic(e1,e2)", "Message-Topic(e2,e1)",
"Other",
]
tok = AutoTokenizer.from_pretrained("DS4AI-UPB/xlmr-base-ro-re")
model = AutoModelForSequenceClassification.from_pretrained("DS4AI-UPB/xlmr-base-ro-re").eval()
def convert_markers(text):
text = text.replace("<e1>", "[E1] ").replace("</e1>", " [/E1]")
return text.replace("<e2>", "[E2] ").replace("</e2>", " [/E2]")
sentence = "<e1>Furtuna</e1> a provocat mari <e2>pagube</e2>."
inputs = tok(convert_markers(sentence), return_tensors="pt", truncation=True, max_length=192)
with torch.no_grad():
pred = model(**inputs).logits.argmax(-1).item()
print(RELATIONS[pred]) # Cause-Effect(e1,e2)
A ready-to-use script is available as infer_encoder.py in the code repository.
Limitations
Trained on a machine-translated dataset (automatic post-validation, not a human gold standard). See the paper for the translation quality analysis.
Citation
@misc{vasile2026crosslingual,
title = {Cross-lingual Relation Extraction with Large Language Models: Zero-Shot, Few-Shot, and Fine-Tuned Evaluation on Romanian},
author = {Vasile, Drago\c{s}-Mitru\c{t} and Apostol, Elena-Simona and Toma, \c{S}tefan-Adrian and Paschke, Adrian and Truic\u{a}, Ciprian-Octavian},
year = {2026},
note = {Preprint}
}
- Downloads last month
- 29
Model tree for DS4AI-UPB/xlmr-base-ro-re
Base model
FacebookAI/xlm-roberta-base