ruanchaves's picture
Create README.md
04c3fcd
|
raw
history blame
1.99 kB
metadata
inference: false
language: pt
datasets:
  - ruanchaves/porsimplessent

mDeBERTa base for Semantic Textual Similarity

This is the microsoft/deberta-v3-base model finetuned for Semantic Textual Similarity with the ASSIN dataset. This model is suitable for Portuguese.

Full regression example

from transformers import AutoModelForSequenceClassification, AutoTokenizer, AutoConfig
import numpy as np
import torch

model_name = "ruanchaves/mdeberta-v3-base-assin-similarity"
s1 = "A gente faz o aporte financeiro, é como se a empresa fosse parceira do Monte Cristo."
s2 = "Fernando Moraes afirma que não tem vínculo com o Monte Cristo além da parceira."
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
config = AutoConfig.from_pretrained(model_name)
model_input = tokenizer(*([s1], [s2]), padding=True, return_tensors="pt")
with torch.no_grad():
    output = model(**model_input)
    score = output[0][0].detach().numpy().item()
    print(f"Similarity Score: {np.round(float(score), 4)}")

Output:

Similarity Score: 2.0592

Citation

Our research is ongoing, and we are currently working on describing our experiments in a paper, which will be published soon. In the meanwhile, if you would like to cite our work or models before the publication of the paper, please cite our GitHub repository:

@software{Chaves_Rodrigues_eplm_2023,
author = {Chaves Rodrigues, Ruan and Tanti, Marc and Agerri, Rodrigo},
doi = {10.5281/zenodo.7781848},
month = {3},
title = {{eplm}},
url = {https://github.com/ruanchaves/eplm},
version = {1.0.0},
year = {2023}
}