rufimelo commited on
Commit
33af9d7
1 Parent(s): 947975d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +39 -24
README.md CHANGED
@@ -1,3 +1,4 @@
 
1
  ---
2
  language:
3
  - pt
@@ -17,12 +18,26 @@ widget:
17
  - "O juíz leu o recurso."
18
  - "O juíz atirou uma pedra."
19
  example_title: "Example 1"
20
- metrics:
21
- - bleu
 
 
 
 
 
 
 
 
 
 
 
 
 
 
22
  ---
23
- # rufimelo/Legal-SBERTimbau-sts-large
24
  This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 1024 dimensional dense vector space and can be used for tasks like clustering or semantic search.
25
- rufimelo/Legal-SBERTimbau-sts-large is based on Legal-BERTimbau-large which derives from [BERTimbau](https://huggingface.co/neuralmind/bert-large-portuguese-cased) alrge.
26
  It is adapted to the Portuguese legal domain and trained for STS on portuguese datasets.
27
  ## Usage (Sentence-Transformers)
28
  Using this model becomes easy when you have [sentence-transformers](https://www.SBERT.net) installed:
@@ -34,7 +49,7 @@ Then you can use the model like this:
34
  from sentence_transformers import SentenceTransformer
35
  sentences = ["Isto é um exemplo", "Isto é um outro exemplo"]
36
 
37
- model = SentenceTransformer('rufimelo/Legal-SBERTimbau-sts-large')
38
  embeddings = model.encode(sentences)
39
  print(embeddings)
40
  ```
@@ -54,8 +69,8 @@ def mean_pooling(model_output, attention_mask):
54
  sentences = ['This is an example sentence', 'Each sentence is converted']
55
 
56
  # Load model from HuggingFace Hub
57
- tokenizer = AutoTokenizer.from_pretrained('rufimelo/Legal-SBERTimbau-sts-large')
58
- model = AutoModel.from_pretrained('rufimelo/Legal-SBERTimbau-sts-large')
59
 
60
  # Tokenize sentences
61
  encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
@@ -69,24 +84,24 @@ print("Sentence embeddings:")
69
  print(sentence_embeddings)
70
  ```
71
  ## Evaluation Results STS
72
- | Model| Dataset | PearsonCorrelation |
73
- | ---------------------------------------- | ---------- | ---------- |
74
- | Legal-SBERTimbau-sts-large| Assin | 0.76629 |
75
- | Legal-SBERTimbau-sts-large| Assin2| 0.82357 |
76
- | Legal-SBERTimbau-sts-base| Assin | 0.71457 |
77
- | Legal-SBERTimbau-sts-base| Assin2| 0.73545|
78
- | Legal-SBERTimbau-sts-large-v2| Assin | 0.76299 |
79
- | Legal-SBERTimbau-sts-large-v2| Assin2| 0.81121 |
80
- | Legal-SBERTimbau-sts-large-v2| stsb_multi_mt pt| 0.81726 |
81
- | ---------------------------------------- | ---------- |---------- |
82
- | paraphrase-multilingual-mpnet-base-v2| Assin | 0.71457|
83
- | paraphrase-multilingual-mpnet-base-v2| Assin2| 0.79831 |
84
- | paraphrase-multilingual-mpnet-base-v2| stsb_multi_mt pt| 0.83999 |
85
- | paraphrase-multilingual-mpnet-base-v2 Fine tuned with assin(s)| Assin | 0.77641 |
86
- | paraphrase-multilingual-mpnet-base-v2 Fine tuned with assin(s)| Assin2| 0.79831 |
87
- | paraphrase-multilingual-mpnet-base-v2 Fine tuned with assin(s)| stsb_multi_mt pt| 0.84575 |
88
  ## Training
89
- rufimelo/Legal-SBERTimbau-sts-large is based on Legal-BERTimbau-largewhich derives from [BERTimbau](https://huggingface.co/neuralmind/bert-base-portuguese-cased) large.
90
  It was trained for Semantic Textual Similarity, being submitted to a fine tuning stage with the [assin](https://huggingface.co/datasets/assin) and [assin2](https://huggingface.co/datasets/assin2) datasets.
91
  ## Full Model Architecture
92
  ```
1
+
2
  ---
3
  language:
4
  - pt
18
  - "O juíz leu o recurso."
19
  - "O juíz atirou uma pedra."
20
  example_title: "Example 1"
21
+ model-index:
22
+ - name: BERTimbau
23
+ results:
24
+ - task:
25
+ name: STS
26
+ type: STS
27
+ metrics:
28
+ - name: Pearson Correlation - assin Dataset
29
+ type: Pearson Correlation
30
+ value: 0.76629
31
+ - name: Pearson Correlation - assin2 Dataset
32
+ type: Pearson Correlation
33
+ value: 0.82357
34
+ - name: Pearson Correlation - stsb_multi_mt pt Dataset
35
+ type: Pearson Correlation
36
+ value: 0.79120
37
  ---
38
+ # rufimelo/Legal-BERTimbau-sts-large
39
  This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 1024 dimensional dense vector space and can be used for tasks like clustering or semantic search.
40
+ rufimelo/Legal-BERTimbau-sts-large is based on Legal-BERTimbau-large which derives from [BERTimbau](https://huggingface.co/neuralmind/bert-large-portuguese-cased) large.
41
  It is adapted to the Portuguese legal domain and trained for STS on portuguese datasets.
42
  ## Usage (Sentence-Transformers)
43
  Using this model becomes easy when you have [sentence-transformers](https://www.SBERT.net) installed:
49
  from sentence_transformers import SentenceTransformer
50
  sentences = ["Isto é um exemplo", "Isto é um outro exemplo"]
51
 
52
+ model = SentenceTransformer('rufimelo/Legal-BERTimbau-sts-large')
53
  embeddings = model.encode(sentences)
54
  print(embeddings)
55
  ```
69
  sentences = ['This is an example sentence', 'Each sentence is converted']
70
 
71
  # Load model from HuggingFace Hub
72
+ tokenizer = AutoTokenizer.from_pretrained('rufimelo/Legal-BERTimbau-sts-large')
73
+ model = AutoModel.from_pretrained('rufimelo/Legal-BERTimbau-sts-large')
74
 
75
  # Tokenize sentences
76
  encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
84
  print(sentence_embeddings)
85
  ```
86
  ## Evaluation Results STS
87
+ | Model| Assin | Assin2|stsb_multi_mt pt|
88
+ | ---------------------------------------- | ---------- | ---------- |---------- |
89
+ | Legal-BERTimbau-sts-base| 0.71457| 0.73545 | 0.72383|
90
+ | Legal-BERTimbau-sts-base-ma| 0.74874 | 0.79532|0.82254 |
91
+ | Legal-BERTimbau-sts-base-ma-v2| 0.75481 | 0.80262|0.82178|
92
+ | Legal-BERTimbau-sts-large| 0.76629| 0.82357 | 0.79120|
93
+ | Legal-BERTimbau-sts-large-v2| 0.76299 | 0.81121|0.81726 |
94
+ | Legal-BERTimbau-sts-large-ma| 0.76195| 0.81622 | 0.82608|
95
+ | Legal-BERTimbau-sts-large-ma-v2| 0.7836| 0.8462| 0.8261|
96
+ | Legal-BERTimbau-sts-large-ma-v3| 0.7749| 0.8470| 0.8364|
97
+ | ---------------------------------------- | ---------- |---------- |---------- |
98
+ | BERTimbau base Fine-tuned for STS|0.78455 | 0.80626|0.82841|
99
+ | BERTimbau large Fine-tuned for STS|0.78193 | 0.81758|0.83784|
100
+ | ---------------------------------------- | ---------- |---------- |---------- |
101
+ | paraphrase-multilingual-mpnet-base-v2| 0.71457| 0.79831 |0.83999 |
102
+ | paraphrase-multilingual-mpnet-base-v2 Fine-tuned with assin(s)| 0.77641|0.79831 |0.84575 |
103
  ## Training
104
+ rufimelo/Legal-BERTimbau-sts-large is based on Legal-BERTimbau-large which derives from [BERTimbau](https://huggingface.co/neuralmind/bert-base-portuguese-cased) large.
105
  It was trained for Semantic Textual Similarity, being submitted to a fine tuning stage with the [assin](https://huggingface.co/datasets/assin) and [assin2](https://huggingface.co/datasets/assin2) datasets.
106
  ## Full Model Architecture
107
  ```