rufimelo commited on
Commit
255ec75
1 Parent(s): 10fc964

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -7
README.md CHANGED
@@ -11,7 +11,7 @@ tags:
11
  datasets:
12
  - assin
13
  - assin2
14
- - stjiris/portuguese-legal-sentences-v0
15
  widget:
16
  - source_sentence: "O advogado apresentou as provas ao juíz."
17
  sentences:
@@ -36,11 +36,11 @@ model-index:
36
  type: Pearson Correlation
37
  value: 0.8249826985133595
38
  ---
39
- # stjiris/bert-large-portuguese-cased-legal-mlm-sts-v0
40
  This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 1024 dimensional dense vector space and can be used for tasks like clustering or semantic search.
41
- stjiris/bert-large-portuguese-cased-legal-mlm-sts-v0 derives from [BERTimbau](https://huggingface.co/neuralmind/bert-large-portuguese-cased) large.
42
 
43
- It was trained using the MLM technique with a learning rate 3e-5 [Legal Sentences from +-30000 documents](https://huggingface.co/datasets/stjiris/portuguese-legal-sentences-v0) 130k training steps (best performance for our semantic search system implementation)
44
 
45
  It is adapted to the Portuguese legal domain and trained for STS on portuguese datasets. [assin](https://huggingface.co/datasets/assin), [assin2](https://huggingface.co/datasets/assin2) and [stsb_multi_mt](https://huggingface.co/datasets/stsb_multi_mt) portuguese subdataset
46
 
@@ -55,7 +55,7 @@ Then you can use the model like this:
55
  from sentence_transformers import SentenceTransformer
56
  sentences = ["Isto é um exemplo", "Isto é um outro exemplo"]
57
 
58
- model = SentenceTransformer('stjiris/bert-large-portuguese-cased-legal-mlm-sts-v0')
59
  embeddings = model.encode(sentences)
60
  print(embeddings)
61
  ```
@@ -75,8 +75,8 @@ def mean_pooling(model_output, attention_mask):
75
  sentences = ['This is an example sentence', 'Each sentence is converted']
76
 
77
  # Load model from HuggingFace Hub
78
- tokenizer = AutoTokenizer.from_pretrained('stjiris/bert-large-portuguese-cased-legal-mlm-sts-v0')
79
- model = AutoModel.from_pretrained('stjiris/bert-large-portuguese-cased-legal-mlm-sts-v0')
80
 
81
  # Tokenize sentences
82
  encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
 
11
  datasets:
12
  - assin
13
  - assin2
14
+ - stjiris/portuguese-legal-sentences-v1.0
15
  widget:
16
  - source_sentence: "O advogado apresentou as provas ao juíz."
17
  sentences:
 
36
  type: Pearson Correlation
37
  value: 0.8249826985133595
38
  ---
39
+ # stjiris/bert-large-portuguese-cased-legal-mlm-sts-v1.0
40
  This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 1024 dimensional dense vector space and can be used for tasks like clustering or semantic search.
41
+ stjiris/bert-large-portuguese-cased-legal-mlm-sts-v1.0 derives from [BERTimbau](https://huggingface.co/neuralmind/bert-large-portuguese-cased) large.
42
 
43
+ It was trained using the MLM technique with a learning rate 3e-5 [Legal Sentences from +-30000 documents](https://huggingface.co/datasets/stjiris/portuguese-legal-sentences-v1.0) 130k training steps (best performance for our semantic search system implementation)
44
 
45
  It is adapted to the Portuguese legal domain and trained for STS on portuguese datasets. [assin](https://huggingface.co/datasets/assin), [assin2](https://huggingface.co/datasets/assin2) and [stsb_multi_mt](https://huggingface.co/datasets/stsb_multi_mt) portuguese subdataset
46
 
 
55
  from sentence_transformers import SentenceTransformer
56
  sentences = ["Isto é um exemplo", "Isto é um outro exemplo"]
57
 
58
+ model = SentenceTransformer('stjiris/bert-large-portuguese-cased-legal-mlm-sts-v1.0')
59
  embeddings = model.encode(sentences)
60
  print(embeddings)
61
  ```
 
75
  sentences = ['This is an example sentence', 'Each sentence is converted']
76
 
77
  # Load model from HuggingFace Hub
78
+ tokenizer = AutoTokenizer.from_pretrained('stjiris/bert-large-portuguese-cased-legal-mlm-sts-v1.0')
79
+ model = AutoModel.from_pretrained('stjiris/bert-large-portuguese-cased-legal-mlm-sts-v1.0')
80
 
81
  # Tokenize sentences
82
  encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')