xlmr-base-ro-re: Multilingual (English + Romanian) encoder.

Dragoș Mitruț Vasile · Elena-Simona Apostol · Stefan-Adrian Toma · Adrian Paschke · Ciprian-Octavian Truică

Paper arXiv Website GitHub License

Fine-tuned xlm-roberta-base (278M) for Relation Classification on a Romanian translation of SemEval-2010 Task 8. Multilingual (English + Romanian) encoder.

Code: github.com/DS4AI-UPB/crosslingual-romanian-re.

Results (macro F1-Score, SemEval-2010 Task 8 test set)

Language macro F1-Score
English .853
Romanian .822

How it works

The model takes a sentence with two entities marked by <e1> and <e2>. Before tokenization, these tags are mapped to four special tokens ([E1] [/E1] [E2] [/E2]) that were added to the vocabulary during fine-tuning. The classifier head predicts one of 19 directional labels (e.g. Cause-Effect(e1,e2) vs Cause-Effect(e2,e1)), which collapse to the 10 coarse SemEval relations.

Usage

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

RELATIONS = [
    "Cause-Effect(e1,e2)", "Cause-Effect(e2,e1)",
    "Instrument-Agency(e1,e2)", "Instrument-Agency(e2,e1)",
    "Product-Producer(e1,e2)", "Product-Producer(e2,e1)",
    "Content-Container(e1,e2)", "Content-Container(e2,e1)",
    "Entity-Origin(e1,e2)", "Entity-Origin(e2,e1)",
    "Entity-Destination(e1,e2)", "Entity-Destination(e2,e1)",
    "Component-Whole(e1,e2)", "Component-Whole(e2,e1)",
    "Member-Collection(e1,e2)", "Member-Collection(e2,e1)",
    "Message-Topic(e1,e2)", "Message-Topic(e2,e1)",
    "Other",
]

tok = AutoTokenizer.from_pretrained("DS4AI-UPB/xlmr-base-ro-re")
model = AutoModelForSequenceClassification.from_pretrained("DS4AI-UPB/xlmr-base-ro-re").eval()

def convert_markers(text):
    text = text.replace("<e1>", "[E1] ").replace("</e1>", " [/E1]")
    return text.replace("<e2>", "[E2] ").replace("</e2>", " [/E2]")

sentence = "<e1>Furtuna</e1> a provocat mari <e2>pagube</e2>."
inputs = tok(convert_markers(sentence), return_tensors="pt", truncation=True, max_length=192)
with torch.no_grad():
    pred = model(**inputs).logits.argmax(-1).item()
print(RELATIONS[pred])   # Cause-Effect(e1,e2)

A ready-to-use script is available as infer_encoder.py in the code repository.

Limitations

Trained on a machine-translated dataset (automatic post-validation, not a human gold standard). See the paper for the translation quality analysis.

Citation

@misc{vasile2026crosslingual,
  title  = {Cross-lingual Relation Extraction with Large Language Models: Zero-Shot, Few-Shot, and Fine-Tuned Evaluation on Romanian},
  author = {Vasile, Drago\c{s}-Mitru\c{t} and Apostol, Elena-Simona and Toma, \c{S}tefan-Adrian and Paschke, Adrian and Truic\u{a}, Ciprian-Octavian},
  year   = {2026},
  note   = {Preprint}
}
Downloads last month
29
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for DS4AI-UPB/xlmr-base-ro-re

Finetuned
(4088)
this model

Dataset used to train DS4AI-UPB/xlmr-base-ro-re

Collection including DS4AI-UPB/xlmr-base-ro-re