Rui Melo commited on
Commit
c5f7cc1
1 Parent(s): b588fd1

initial commit

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false
7
+ }
README.md CHANGED
@@ -1,3 +1,125 @@
1
  ---
2
- license: mit
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ pipeline_tag: sentence-similarity
3
+ tags:
4
+ - sentence-transformers
5
+ - feature-extraction
6
+ - sentence-similarity
7
+ - transformers
8
  ---
9
+
10
+ # {MODEL_NAME}
11
+
12
+ This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search.
13
+
14
+ <!--- Describe your model here -->
15
+
16
+ ## Usage (Sentence-Transformers)
17
+
18
+ Using this model becomes easy when you have [sentence-transformers](https://www.SBERT.net) installed:
19
+
20
+ ```
21
+ pip install -U sentence-transformers
22
+ ```
23
+
24
+ Then you can use the model like this:
25
+
26
+ ```python
27
+ from sentence_transformers import SentenceTransformer
28
+ sentences = ["This is an example sentence", "Each sentence is converted"]
29
+
30
+ model = SentenceTransformer('{MODEL_NAME}')
31
+ embeddings = model.encode(sentences)
32
+ print(embeddings)
33
+ ```
34
+
35
+
36
+
37
+ ## Usage (HuggingFace Transformers)
38
+ Without [sentence-transformers](https://www.SBERT.net), you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings.
39
+
40
+ ```python
41
+ from transformers import AutoTokenizer, AutoModel
42
+ import torch
43
+
44
+
45
+ #Mean Pooling - Take attention mask into account for correct averaging
46
+ def mean_pooling(model_output, attention_mask):
47
+ token_embeddings = model_output[0] #First element of model_output contains all token embeddings
48
+ input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
49
+ return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
50
+
51
+
52
+ # Sentences we want sentence embeddings for
53
+ sentences = ['This is an example sentence', 'Each sentence is converted']
54
+
55
+ # Load model from HuggingFace Hub
56
+ tokenizer = AutoTokenizer.from_pretrained('{MODEL_NAME}')
57
+ model = AutoModel.from_pretrained('{MODEL_NAME}')
58
+
59
+ # Tokenize sentences
60
+ encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
61
+
62
+ # Compute token embeddings
63
+ with torch.no_grad():
64
+ model_output = model(**encoded_input)
65
+
66
+ # Perform pooling. In this case, mean pooling.
67
+ sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
68
+
69
+ print("Sentence embeddings:")
70
+ print(sentence_embeddings)
71
+ ```
72
+
73
+
74
+
75
+ ## Evaluation Results
76
+
77
+ <!--- Describe how your model was evaluated -->
78
+
79
+ For an automated evaluation of this model, see the *Sentence Embeddings Benchmark*: [https://seb.sbert.net](https://seb.sbert.net?model_name={MODEL_NAME})
80
+
81
+
82
+ ## Training
83
+ The model was trained with the parameters:
84
+
85
+ **DataLoader**:
86
+
87
+ `torch.utils.data.dataloader.DataLoader` of length 270 with parameters:
88
+ ```
89
+ {'batch_size': 64, 'sampler': 'torch.utils.data.sampler.RandomSampler', 'batch_sampler': 'torch.utils.data.sampler.BatchSampler'}
90
+ ```
91
+
92
+ **Loss**:
93
+
94
+ `sentence_transformers.losses.CosineSimilarityLoss.CosineSimilarityLoss`
95
+
96
+ Parameters of the fit()-Method:
97
+ ```
98
+ {
99
+ "epochs": 5,
100
+ "evaluation_steps": 0,
101
+ "evaluator": "NoneType",
102
+ "max_grad_norm": 1,
103
+ "optimizer_class": "<class 'transformers.optimization.AdamW'>",
104
+ "optimizer_params": {
105
+ "lr": 2e-05
106
+ },
107
+ "scheduler": "WarmupLinear",
108
+ "steps_per_epoch": null,
109
+ "warmup_steps": 135,
110
+ "weight_decay": 0.01
111
+ }
112
+ ```
113
+
114
+
115
+ ## Full Model Architecture
116
+ ```
117
+ SentenceTransformer(
118
+ (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: BertModel
119
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
120
+ )
121
+ ```
122
+
123
+ ## Citing & Authors
124
+
125
+ <!--- Describe where people can find more information -->
config.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "rufimelo/Legal-BERTimbau-base",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "directionality": "bidi",
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 3072,
14
+ "layer_norm_eps": 1e-12,
15
+ "max_position_embeddings": 512,
16
+ "model_type": "bert",
17
+ "num_attention_heads": 12,
18
+ "num_hidden_layers": 12,
19
+ "output_past": true,
20
+ "pad_token_id": 0,
21
+ "pooler_fc_size": 768,
22
+ "pooler_num_attention_heads": 12,
23
+ "pooler_num_fc_layers": 3,
24
+ "pooler_size_per_head": 128,
25
+ "pooler_type": "first_token_transform",
26
+ "position_embedding_type": "absolute",
27
+ "torch_dtype": "float32",
28
+ "transformers_version": "4.20.1",
29
+ "type_vocab_size": 2,
30
+ "use_cache": true,
31
+ "vocab_size": 29794
32
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "2.2.0",
4
+ "transformers": "4.20.1",
5
+ "pytorch": "1.10.1+cu111"
6
+ }
7
+ }
eval/mse_evaluation_TED2020-en-pt-dev.tsv.gz_results.csv ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ epoch,steps,MSE
2
+ 0,1000,5.367035418748856
3
+ 0,2000,5.193383619189262
4
+ 0,3000,5.061572417616844
5
+ 0,4000,4.889959841966629
6
+ 0,5000,4.609957709908485
7
+ 0,6000,4.376227408647537
8
+ 0,7000,4.126685485243797
9
+ 0,8000,3.894694149494171
10
+ 0,9000,3.71534526348114
11
+ 0,-1,3.620472177863121
12
+ 1,1000,3.3613737672567368
13
+ 1,2000,3.2121531665325165
14
+ 1,3000,3.0989496037364006
15
+ 1,4000,2.999489940702915
16
+ 1,5000,2.927454374730587
17
+ 1,6000,2.853638492524624
18
+ 1,7000,2.805120311677456
19
+ 1,8000,2.740798704326153
20
+ 1,9000,2.7038952335715294
21
+ 1,-1,2.6784922927618027
22
+ 2,1000,2.6389440521597862
23
+ 2,2000,2.608192525804043
24
+ 2,3000,2.5770554319024086
25
+ 2,4000,2.548805996775627
26
+ 2,5000,2.518361434340477
27
+ 2,6000,2.505100704729557
28
+ 2,7000,2.474398724734783
29
+ 2,8000,2.456068992614746
30
+ 2,9000,2.4350930005311966
31
+ 2,-1,2.429385669529438
32
+ 3,1000,2.4136649444699287
33
+ 3,2000,2.3951873183250427
34
+ 3,3000,2.381756342947483
35
+ 3,4000,2.368186227977276
36
+ 3,5000,2.3565009236335754
37
+ 3,6000,2.345930226147175
38
+ 3,7000,2.3331772536039352
39
+ 3,8000,2.3244358599185944
40
+ 3,9000,2.315283380448818
41
+ 3,-1,2.3112069815397263
42
+ 4,1000,2.3025305941700935
43
+ 4,2000,2.2977473214268684
44
+ 4,3000,2.2921686992049217
45
+ 4,4000,2.2856738418340683
46
+ 4,5000,2.2822346538305283
47
+ 4,6000,2.2782722488045692
48
+ 4,7000,2.2773338481783867
49
+ 4,8000,2.2727908566594124
50
+ 4,9000,2.2715413942933083
51
+ 4,-1,2.2708337754011154
eval/similarity_evaluation_STS.en-en.txt_results.csv ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ epoch,steps,cosine_pearson,cosine_spearman,euclidean_pearson,euclidean_spearman,manhattan_pearson,manhattan_spearman,dot_pearson,dot_spearman
2
+ 0,1000,0.5140056805201973,0.5890807953828568,0.559515868293356,0.5922912753774262,0.5451349325359995,0.5819437046190675,0.22539237809823698,0.22321946160517406
3
+ 0,2000,0.44619621291626504,0.5540864865627407,0.5162284484313594,0.5612347248265113,0.5150404333011369,0.5623890676693863,-0.03925888866920268,-0.0475617696609267
4
+ 0,3000,0.43431388438460883,0.5435586338219284,0.5104576500180034,0.5460433731280702,0.5106740455350732,0.5445945825270726,0.0800124192733146,0.043582496564209525
5
+ 0,4000,0.4815789858416121,0.5789254229000352,0.5474475569981547,0.5914225391719992,0.5464185006481518,0.589996427967515,0.16963048197098604,0.1883327838193587
6
+ 0,5000,0.5294007839824181,0.5791329970376151,0.567137936153863,0.586330438059923,0.5649011349039395,0.5854009671994264,0.26817398324585057,0.2920114527819132
7
+ 0,6000,0.5463603874314922,0.5734973592023205,0.5686220459321023,0.5703722152420895,0.5685273926188351,0.5723764588593888,0.29675469571899443,0.2705229167837791
8
+ 0,7000,0.5469980476553145,0.5754608567852059,0.5701478329200861,0.5696883737777289,0.5730245680102316,0.573734916270884,0.2881408596960359,0.2756284717751429
9
+ 0,8000,0.5838914267430435,0.6075437461274882,0.5854000552991829,0.5852629688375908,0.5866926240554013,0.5858368728883442,0.31497607283965356,0.31274042119512513
10
+ 0,9000,0.6082456344854708,0.6231621623941724,0.6021977865748874,0.5993860825205976,0.601416194932199,0.5996405530374085,0.3162368894399329,0.32753507565285805
11
+ 0,-1,0.6340241303458328,0.6488629157989745,0.6272089670611783,0.6212413328284562,0.6261247664573459,0.6231917609286052,0.3614941847538112,0.37414354186712145
12
+ 1,1000,0.6489685011407933,0.667540359819105,0.6420782114622586,0.6385622414198467,0.6414950991944978,0.6386564185748599,0.41926990072233455,0.44001873201431163
13
+ 1,2000,0.6501563072131639,0.6724291151556631,0.6478906595409323,0.6482440373517454,0.6471718858768877,0.6482186671793746,0.46287014167446283,0.485352539265214
14
+ 1,3000,0.6604112169020907,0.6875120670267013,0.6634056718084428,0.6645136213759489,0.6626030154646603,0.6639950704285503,0.48555192969433814,0.5041191632851935
15
+ 1,4000,0.6623806223921189,0.6924842364148425,0.6646110249594901,0.6652581974953791,0.6644709604750156,0.6659943168906857,0.528495364233784,0.5462351870070561
16
+ 1,5000,0.6850816606891217,0.7075345145790395,0.6806572905222198,0.678914269368831,0.6797787684901784,0.6797999190225054,0.5479364502980393,0.5600915294836176
17
+ 1,6000,0.6851700329054995,0.7078185836302832,0.6874242074031183,0.6846771424625312,0.686830327548116,0.6857607563400085,0.5694312299436508,0.5769857579033162
18
+ 1,7000,0.6890474363339373,0.7154811444793906,0.6931583436684235,0.6908082674521603,0.6927393578265696,0.6917877098642965,0.579360228173903,0.5864995725423955
19
+ 1,8000,0.7008481781396507,0.730200457213113,0.70873346507043,0.7077839879406864,0.7089441297196204,0.7086238944047829,0.5928760319501843,0.5999522986403294
20
+ 1,9000,0.706014592098889,0.7314924140212724,0.7128013776176435,0.714079250257624,0.7132612508970737,0.7159108998234912,0.6054494128829728,0.6106827283635577
21
+ 1,-1,0.7200253926417177,0.7398388163347404,0.7188000262780588,0.7147411811185732,0.7187261585057005,0.7175292093035124,0.6156732874019484,0.6163637249918793
22
+ 2,1000,0.7178886037426083,0.73718993970129,0.7188454644618073,0.7163179757710785,0.7194434859869421,0.7170467916319146,0.6393994461966461,0.6420075877071223
23
+ 2,2000,0.7299703637244253,0.7495332973528245,0.7277494458275431,0.727454712648792,0.727480829295701,0.7270045842874844,0.6458760444542806,0.6473533905429072
24
+ 2,3000,0.734261157337224,0.753223504243134,0.7330394673578884,0.7315831316073258,0.7331244904533004,0.7331099547081914,0.6513263507254936,0.654268300100176
25
+ 2,4000,0.7343011501722011,0.7521441187277185,0.7332802694216066,0.7300190220410061,0.7333951460801283,0.7306352097123775,0.6484172411241667,0.6530839743263172
26
+ 2,5000,0.742889074497301,0.7578270373387953,0.7368161790211505,0.7330780497944523,0.7366270428032792,0.7334839725523863,0.662337265771508,0.668018933525192
27
+ 2,6000,0.7431224743771536,0.758464751217027,0.7395656885060564,0.7367328922020631,0.7391788408718201,0.736238173840831,0.6692784158683307,0.6753332310990264
28
+ 2,7000,0.7456805796696031,0.7601338010417898,0.7391593091221383,0.7367286638400012,0.7389669412990082,0.7370692391842527,0.6704350477276445,0.675819492736135
29
+ 2,8000,0.7484436749852499,0.7616302568151206,0.7449520469152102,0.7437711930522265,0.7445979805880989,0.7435405551215821,0.6783951732730306,0.6837830360847332
30
+ 2,9000,0.7515812487144248,0.7653300736192071,0.7479981515666386,0.7463912399443463,0.7477166755497039,0.7468763483918016,0.6711506749052628,0.6750929832546052
31
+ 2,-1,0.7542529351113809,0.7673316264606489,0.7496156203554739,0.7468824987366188,0.749008069829383,0.7469482305468524,0.6818984216075609,0.686199737201335
32
+ 3,1000,0.7611521834139107,0.7740116697252114,0.7536009182598888,0.7521679513138851,0.7531755451605359,0.7523513084687473,0.6892740827791818,0.6931277162413401
33
+ 3,2000,0.7586248161621655,0.7701569411110422,0.755909895602228,0.7544185931204229,0.7548875499498792,0.7537097658802425,0.6879328540486022,0.6938988157227944
34
+ 3,3000,0.7596638492114828,0.7707911954203143,0.7538348247790335,0.7529963258814494,0.7525016100495691,0.7513153597636032,0.6932043864798306,0.6983043845946526
35
+ 3,4000,0.7613138031064591,0.7727527710204444,0.7571464667789152,0.7562071902725698,0.75597648205047,0.7563336567378731,0.6938998652971966,0.6962067326154422
36
+ 3,5000,0.7650309606568694,0.7768146893756425,0.7594812570831899,0.7602564235415824,0.7590889841536251,0.7594991623359669,0.6978314025203474,0.7022659754500203
37
+ 3,6000,0.7664952424832512,0.7790511085097905,0.7618506923044225,0.7615095562980834,0.7604737238388333,0.7618524380216414,0.7028292027901824,0.7054530072549742
38
+ 3,7000,0.7663263064442136,0.777784906270553,0.7605793391945297,0.76110747750566,0.7593518137572904,0.7596444642322727,0.7090311202732736,0.710636594746206
39
+ 3,8000,0.7736767057917926,0.7843703879835514,0.7661449138740201,0.7653715884467232,0.765197369764307,0.7645774251722043,0.7118985320428509,0.7171817148213416
40
+ 3,9000,0.7702620952806344,0.7798033725602421,0.7627031514117696,0.7628783924164577,0.7618706673707889,0.7619888987972725,0.7116935091812764,0.7166508631843085
41
+ 3,-1,0.7741045322811847,0.7846164017762386,0.766426143295265,0.7672316833573697,0.7656662248256637,0.7657060134461573,0.7175361585482368,0.7232432640352262
42
+ 4,1000,0.7735083894838758,0.783234880571679,0.7675561452748201,0.768252640597022,0.7668961139319035,0.7669191689613465,0.7132638654641691,0.7171378936145191
43
+ 4,2000,0.7725586129136159,0.7815166279883786,0.7650998024971205,0.76562106180837,0.764474806234657,0.763171302588376,0.7167715021505252,0.7228265781738621
44
+ 4,3000,0.7748299500771854,0.7856911745330413,0.7683797677065075,0.7697556311117211,0.7675892533909863,0.7666677736169443,0.7149058735588968,0.7214984880899017
45
+ 4,4000,0.7733142656422624,0.7829665717790294,0.7697179531436572,0.7706212921447395,0.7687669037730513,0.7688492240442888,0.7129858054389775,0.7196264768861717
46
+ 4,5000,0.7727933114079547,0.7833551966921651,0.7680052284404755,0.7678947674079721,0.7670476312702246,0.7664794193069181,0.7161797854781118,0.7222991861057886
47
+ 4,6000,0.7779233136930357,0.7875170581506424,0.7720112160776631,0.7722818852453789,0.7712775067154483,0.7710914091267029,0.7184599484089325,0.7252632679111196
48
+ 4,7000,0.7762983351044147,0.7866387120314384,0.770285872163689,0.7706720324894812,0.76963801099933,0.7707823542996394,0.7172870884685507,0.723367808517774
49
+ 4,8000,0.7759104338027587,0.7852256703096908,0.7699957777935053,0.7699055457666398,0.7692556775653562,0.7698498082667341,0.718403828898744,0.7242734467921043
50
+ 4,9000,0.7761319595948091,0.7863454174629692,0.7707770511912085,0.770702399817016,0.7699469552447272,0.7706581942136426,0.7189941981604312,0.7253793556695439
51
+ 4,-1,0.7758264479442808,0.7863123593595768,0.7705611226987729,0.7702111410247436,0.7697432047898018,0.7699178464562743,0.7189547164729727,0.7252978636007161
eval/translation_evaluation_TED2020-en-pt-dev.tsv.gz_results.csv ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ epoch,steps,src2trg,trg2src
2
+ 0,1000,0.045,0.185
3
+ 0,2000,0.055,0.108
4
+ 0,3000,0.093,0.139
5
+ 0,4000,0.21,0.248
6
+ 0,5000,0.426,0.438
7
+ 0,6000,0.584,0.613
8
+ 0,7000,0.718,0.719
9
+ 0,8000,0.808,0.814
10
+ 0,9000,0.856,0.879
11
+ 0,-1,0.887,0.895
12
+ 1,1000,0.92,0.923
13
+ 1,2000,0.93,0.929
14
+ 1,3000,0.939,0.936
15
+ 1,4000,0.946,0.947
16
+ 1,5000,0.954,0.951
17
+ 1,6000,0.957,0.958
18
+ 1,7000,0.959,0.957
19
+ 1,8000,0.964,0.959
20
+ 1,9000,0.966,0.962
21
+ 1,-1,0.966,0.964
22
+ 2,1000,0.966,0.965
23
+ 2,2000,0.969,0.966
24
+ 2,3000,0.969,0.968
25
+ 2,4000,0.971,0.97
26
+ 2,5000,0.972,0.971
27
+ 2,6000,0.972,0.972
28
+ 2,7000,0.973,0.972
29
+ 2,8000,0.973,0.97
30
+ 2,9000,0.973,0.97
31
+ 2,-1,0.972,0.971
32
+ 3,1000,0.973,0.973
33
+ 3,2000,0.975,0.971
34
+ 3,3000,0.975,0.971
35
+ 3,4000,0.973,0.971
36
+ 3,5000,0.973,0.973
37
+ 3,6000,0.974,0.974
38
+ 3,7000,0.974,0.973
39
+ 3,8000,0.974,0.976
40
+ 3,9000,0.973,0.974
41
+ 3,-1,0.973,0.975
42
+ 4,1000,0.975,0.976
43
+ 4,2000,0.974,0.975
44
+ 4,3000,0.975,0.974
45
+ 4,4000,0.974,0.975
46
+ 4,5000,0.974,0.975
47
+ 4,6000,0.974,0.976
48
+ 4,7000,0.974,0.976
49
+ 4,8000,0.974,0.976
50
+ 4,9000,0.974,0.976
51
+ 4,-1,0.974,0.976
modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ }
14
+ ]
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fa463de066e07c9105cf2565f31f553fcf919054cb93aa46fef22ae2542d3ca3
3
+ size 435761969
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
1
+ {
2
+ "max_seq_length": 128,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
tokenizer_config.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "do_basic_tokenize": true,
4
+ "do_lower_case": false,
5
+ "mask_token": "[MASK]",
6
+ "name_or_path": "rufimelo/Legal-BERTimbau-base",
7
+ "never_split": null,
8
+ "pad_token": "[PAD]",
9
+ "sep_token": "[SEP]",
10
+ "special_tokens_map_file": "/home/ruimelo/.cache/huggingface/transformers/eecc45187d085a1169eed91017d358cc0e9cbdd5dc236bcd710059dbf0a2f816.dd8bd9bfd3664b530ea4e645105f557769387b3da9f79bdb55ed556bdd80611d",
11
+ "strip_accents": null,
12
+ "tokenize_chinese_chars": true,
13
+ "tokenizer_class": "BertTokenizer",
14
+ "unk_token": "[UNK]"
15
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff