ruanchaves commited on
Commit
16c6c7a
1 Parent(s): a41cca1

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +59 -0
README.md ADDED
@@ -0,0 +1,59 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ inference: false
3
+ language: pt
4
+ datasets:
5
+ - assin2
6
+ ---
7
+
8
+
9
+ # BERTimbau base for Semantic Textual Similarity
10
+
11
+ This is the [neuralmind/bert-base-portuguese-cased](https://huggingface.co/neuralmind/bert-base-portuguese-cased) model finetuned for
12
+ Semantic Textual Similarity with the [ASSIN 2](huggingface.co/datasets/assin2) dataset.
13
+ This model is suitable for Portuguese.
14
+
15
+ - Git Repo: [Evaluation of Portuguese Language Models](https://github.com/ruanchaves/eplm).
16
+ - Demo: [Portuguese Semantic Similarity](https://ruanchaves-portuguese-semantic-similarity.hf.space)
17
+
18
+ ## Full regression example
19
+
20
+ ```python
21
+ from transformers import AutoModelForSequenceClassification, AutoTokenizer, AutoConfig
22
+ import numpy as np
23
+ import torch
24
+
25
+ model_name = "ruanchaves/bert-base-portuguese-cased-assin2-similarity"
26
+ s1 = "A gente faz o aporte financeiro, é como se a empresa fosse parceira do Monte Cristo."
27
+ s2 = "Fernando Moraes afirma que não tem vínculo com o Monte Cristo além da parceira."
28
+ model = AutoModelForSequenceClassification.from_pretrained(model_name)
29
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
30
+ config = AutoConfig.from_pretrained(model_name)
31
+ model_input = tokenizer(*([s1], [s2]), padding=True, return_tensors="pt")
32
+ with torch.no_grad():
33
+ output = model(**model_input)
34
+ score = output[0][0].detach().numpy().item()
35
+ print(f"Similarity Score: {np.round(float(score), 4)}")
36
+ ```
37
+
38
+ Output:
39
+
40
+ ```
41
+ Similarity Score: 3.1819
42
+ ```
43
+
44
+ ## Citation
45
+
46
+ Our research is ongoing, and we are currently working on describing our experiments in a paper, which will be published soon.
47
+ In the meanwhile, if you would like to cite our work or models before the publication of the paper, please cite our [GitHub repository](https://github.com/ruanchaves/eplm):
48
+
49
+ ```
50
+ @software{Chaves_Rodrigues_eplm_2023,
51
+ author = {Chaves Rodrigues, Ruan and Tanti, Marc and Agerri, Rodrigo},
52
+ doi = {10.5281/zenodo.7781848},
53
+ month = {3},
54
+ title = {{eplm}},
55
+ url = {https://github.com/ruanchaves/eplm},
56
+ version = {1.0.0},
57
+ year = {2023}
58
+ }
59
+ ```