SANI for Heterogeneous Scholar Entity Resolution
This model is a fine-tuned checkpoint for heterogeneous scholar entity resolution, designed to determine whether two scholar records refer to the same real-world person.
The model is based on XLM-RoBERTa and introduces Soft-Aligned Attentive Neighborhood Injection (SANI). Instead of relying only on the attributes of a candidate pair, SANI retrieves neighboring scholar records, softly aligns their contextual signals with the target scholar representation, and injects the aggregated neighborhood evidence into the encoder. This helps the model handle difficult cases such as multilingual names, abbreviated or reversed names, missing attributes, homonyms, and affiliation changes over time.
Intended Use
The model is intended for binary scholar record matching:
- Input: two serialized scholar records containing fields such as name, affiliation, research interests, papers, and projects.
- Output: a match / non-match prediction indicating whether the two records describe the same scholar.
Example input format:
COL Name VAL ... COL Affiliation VAL ... COL Research Interests VAL ... COL Projects VAL ... COL Papers VAL ...