Upload 12 files
Browse files- 1_Pooling/config.json +7 -0
- README.md +131 -3
- config.json +32 -0
- config_sentence_transformers.json +7 -0
- eval/Information-Retrieval_evaluation_eval_results.csv +16 -0
- modules.json +14 -0
- pytorch_model.bin +3 -0
- sentence_bert_config.json +4 -0
- special_tokens_map.json +7 -0
- tokenizer.json +0 -0
- tokenizer_config.json +15 -0
- vocab.txt +0 -0
1_Pooling/config.json
ADDED
@@ -0,0 +1,7 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"word_embedding_dimension": 768,
|
3 |
+
"pooling_mode_cls_token": false,
|
4 |
+
"pooling_mode_mean_tokens": true,
|
5 |
+
"pooling_mode_max_tokens": false,
|
6 |
+
"pooling_mode_mean_sqrt_len_tokens": false
|
7 |
+
}
|
README.md
CHANGED
@@ -1,3 +1,131 @@
|
|
1 |
-
---
|
2 |
-
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
pipeline_tag: sentence-similarity
|
3 |
+
tags:
|
4 |
+
- sentence-transformers
|
5 |
+
- feature-extraction
|
6 |
+
- sentence-similarity
|
7 |
+
- transformers
|
8 |
+
|
9 |
+
---
|
10 |
+
|
11 |
+
# {MODEL_NAME}
|
12 |
+
|
13 |
+
This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search.
|
14 |
+
|
15 |
+
<!--- Describe your model here -->
|
16 |
+
|
17 |
+
## Usage (Sentence-Transformers)
|
18 |
+
|
19 |
+
Using this model becomes easy when you have [sentence-transformers](https://www.SBERT.net) installed:
|
20 |
+
|
21 |
+
```
|
22 |
+
pip install -U sentence-transformers
|
23 |
+
```
|
24 |
+
|
25 |
+
Then you can use the model like this:
|
26 |
+
|
27 |
+
```python
|
28 |
+
from sentence_transformers import SentenceTransformer
|
29 |
+
sentences = ["This is an example sentence", "Each sentence is converted"]
|
30 |
+
|
31 |
+
model = SentenceTransformer('{MODEL_NAME}')
|
32 |
+
embeddings = model.encode(sentences)
|
33 |
+
print(embeddings)
|
34 |
+
```
|
35 |
+
|
36 |
+
|
37 |
+
|
38 |
+
## Usage (HuggingFace Transformers)
|
39 |
+
Without [sentence-transformers](https://www.SBERT.net), you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings.
|
40 |
+
|
41 |
+
```python
|
42 |
+
from transformers import AutoTokenizer, AutoModel
|
43 |
+
import torch
|
44 |
+
|
45 |
+
|
46 |
+
#Mean Pooling - Take attention mask into account for correct averaging
|
47 |
+
def mean_pooling(model_output, attention_mask):
|
48 |
+
token_embeddings = model_output[0] #First element of model_output contains all token embeddings
|
49 |
+
input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
|
50 |
+
return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
|
51 |
+
|
52 |
+
|
53 |
+
# Sentences we want sentence embeddings for
|
54 |
+
sentences = ['This is an example sentence', 'Each sentence is converted']
|
55 |
+
|
56 |
+
# Load model from HuggingFace Hub
|
57 |
+
tokenizer = AutoTokenizer.from_pretrained('{MODEL_NAME}')
|
58 |
+
model = AutoModel.from_pretrained('{MODEL_NAME}')
|
59 |
+
|
60 |
+
# Tokenize sentences
|
61 |
+
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
|
62 |
+
|
63 |
+
# Compute token embeddings
|
64 |
+
with torch.no_grad():
|
65 |
+
model_output = model(**encoded_input)
|
66 |
+
|
67 |
+
# Perform pooling. In this case, mean pooling.
|
68 |
+
sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
|
69 |
+
|
70 |
+
print("Sentence embeddings:")
|
71 |
+
print(sentence_embeddings)
|
72 |
+
```
|
73 |
+
|
74 |
+
|
75 |
+
|
76 |
+
## Evaluation Results
|
77 |
+
|
78 |
+
<!--- Describe how your model was evaluated -->
|
79 |
+
|
80 |
+
For an automated evaluation of this model, see the *Sentence Embeddings Benchmark*: [https://seb.sbert.net](https://seb.sbert.net?model_name={MODEL_NAME})
|
81 |
+
|
82 |
+
|
83 |
+
## Training
|
84 |
+
The model was trained with the parameters:
|
85 |
+
|
86 |
+
**DataLoader**:
|
87 |
+
|
88 |
+
`torch.utils.data.dataloader.DataLoader` of length 12580 with parameters:
|
89 |
+
```
|
90 |
+
{'batch_size': 16, 'sampler': 'torch.utils.data.sampler.RandomSampler', 'batch_sampler': 'torch.utils.data.sampler.BatchSampler'}
|
91 |
+
```
|
92 |
+
|
93 |
+
**Loss**:
|
94 |
+
|
95 |
+
`sentence_transformers.losses.MultipleNegativesRankingLoss.MultipleNegativesRankingLoss` with parameters:
|
96 |
+
```
|
97 |
+
{'scale': 20.0, 'similarity_fct': 'cos_sim'}
|
98 |
+
```
|
99 |
+
|
100 |
+
Parameters of the fit()-Method:
|
101 |
+
```
|
102 |
+
{
|
103 |
+
"epochs": 5,
|
104 |
+
"evaluation_steps": 5000,
|
105 |
+
"evaluator": "sentence_transformers.evaluation.InformationRetrievalEvaluator.InformationRetrievalEvaluator",
|
106 |
+
"max_grad_norm": 1,
|
107 |
+
"optimizer_class": "<class 'transformers.optimization.AdamW'>",
|
108 |
+
"optimizer_params": {
|
109 |
+
"correct_bias": false,
|
110 |
+
"eps": 1e-06,
|
111 |
+
"lr": 2e-05
|
112 |
+
},
|
113 |
+
"scheduler": "WarmupLinear",
|
114 |
+
"steps_per_epoch": null,
|
115 |
+
"warmup_steps": 6290,
|
116 |
+
"weight_decay": 0.01
|
117 |
+
}
|
118 |
+
```
|
119 |
+
|
120 |
+
|
121 |
+
## Full Model Architecture
|
122 |
+
```
|
123 |
+
SentenceTransformer(
|
124 |
+
(0): Transformer({'max_seq_length': 350, 'do_lower_case': False}) with Transformer model: BertModel
|
125 |
+
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
|
126 |
+
)
|
127 |
+
```
|
128 |
+
|
129 |
+
## Citing & Authors
|
130 |
+
|
131 |
+
<!--- Describe where people can find more information -->
|
config.json
ADDED
@@ -0,0 +1,32 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"_name_or_path": "/new_disk2/changle_qu/beir-main/PLMs/msmarco-bert-co-condensor",
|
3 |
+
"architectures": [
|
4 |
+
"BertModel"
|
5 |
+
],
|
6 |
+
"attention_probs_dropout_prob": 0.1,
|
7 |
+
"classifier_dropout": null,
|
8 |
+
"gradient_checkpointing": false,
|
9 |
+
"hidden_act": "gelu",
|
10 |
+
"hidden_dropout_prob": 0.1,
|
11 |
+
"hidden_size": 768,
|
12 |
+
"id2label": {
|
13 |
+
"0": "LABEL_0"
|
14 |
+
},
|
15 |
+
"initializer_range": 0.02,
|
16 |
+
"intermediate_size": 3072,
|
17 |
+
"label2id": {
|
18 |
+
"LABEL_0": 0
|
19 |
+
},
|
20 |
+
"layer_norm_eps": 1e-12,
|
21 |
+
"max_position_embeddings": 512,
|
22 |
+
"model_type": "bert",
|
23 |
+
"num_attention_heads": 12,
|
24 |
+
"num_hidden_layers": 12,
|
25 |
+
"pad_token_id": 0,
|
26 |
+
"position_embedding_type": "absolute",
|
27 |
+
"torch_dtype": "float32",
|
28 |
+
"transformers_version": "4.28.1",
|
29 |
+
"type_vocab_size": 2,
|
30 |
+
"use_cache": true,
|
31 |
+
"vocab_size": 30522
|
32 |
+
}
|
config_sentence_transformers.json
ADDED
@@ -0,0 +1,7 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"__version__": {
|
3 |
+
"sentence_transformers": "2.2.2",
|
4 |
+
"transformers": "4.28.1",
|
5 |
+
"pytorch": "1.13.1+cu117"
|
6 |
+
}
|
7 |
+
}
|
eval/Information-Retrieval_evaluation_eval_results.csv
ADDED
@@ -0,0 +1,16 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
epoch,steps,cos_sim-Accuracy@1,cos_sim-Accuracy@3,cos_sim-Accuracy@5,cos_sim-Accuracy@10,cos_sim-Precision@1,cos_sim-Recall@1,cos_sim-Precision@3,cos_sim-Recall@3,cos_sim-Precision@5,cos_sim-Recall@5,cos_sim-Precision@10,cos_sim-Recall@10,cos_sim-MRR@10,cos_sim-NDCG@10,cos_sim-MAP@100,dot_score-Accuracy@1,dot_score-Accuracy@3,dot_score-Accuracy@5,dot_score-Accuracy@10,dot_score-Precision@1,dot_score-Recall@1,dot_score-Precision@3,dot_score-Recall@3,dot_score-Precision@5,dot_score-Recall@5,dot_score-Precision@10,dot_score-Recall@10,dot_score-MRR@10,dot_score-NDCG@10,dot_score-MAP@100
|
2 |
+
0,5000,0.547365988909427,0.7324399260628466,0.791358595194085,0.8524722735674677,0.547365988909427,0.23011591189155883,0.3421133703019101,0.42144562538508934,0.2525415896487985,0.5139537122612446,0.15411275415896492,0.6209219038817004,0.6518358217439749,0.5403711502060098,0.45783229717688606,0.5243761552680222,0.7168438077634011,0.7800369685767098,0.8433456561922366,0.5243761552680222,0.220608056069008,0.3323706099815157,0.40976008934072705,0.24519408502772644,0.4993126155268021,0.1508780036968577,0.6086163739987677,0.6327616590382269,0.5248054076508489,0.4425039719604371
|
3 |
+
0,10000,0.5901109057301294,0.7706792975970426,0.8261321626617375,0.8785813308687616,0.5901109057301294,0.24732362908194702,0.37299753542821934,0.4576420979667283,0.2734519408502773,0.5537392174984597,0.16611598890942703,0.6644601047443006,0.6911249284834036,0.5816379003822459,0.49697601633787525,0.570586876155268,0.7584334565619224,0.81862292051756,0.8736136783733827,0.570586876155268,0.23965650030807148,0.36271565003080714,0.4452248921749846,0.2666820702402958,0.5395506007393714,0.16384011090573017,0.6556338570548368,0.6765348102426397,0.5693404295977108,0.48395427608682007
|
4 |
+
0,-1,0.6051293900184843,0.7851201478743068,0.8376848428835489,0.886090573012939,0.6051293900184843,0.25377195009242143,0.3847812692544671,0.4721984750462107,0.28232439926062847,0.5703712261244609,0.1714302218114603,0.684829405422058,0.7047730906903698,0.5990869228015125,0.5143357972040759,0.586645101663586,0.7750693160813309,0.8313308687615527,0.8805452865064695,0.586645101663586,0.2461818391866913,0.3749614910659273,0.45982555452865065,0.2757624768946396,0.5573475046210721,0.16880776340110906,0.6744531731361676,0.6907447444473741,0.5859247933629872,0.5002992841439944
|
5 |
+
1,5000,0.5999306839186691,0.7858133086876156,0.8390711645101664,0.8908271719038817,0.5999306839186691,0.25016558841651265,0.3909812076401725,0.47756276956253857,0.28763863216266183,0.5797693314849045,0.17567005545286507,0.6994396950092422,0.702379072778214,0.6067679976925308,0.5221348502558842,0.588724584103512,0.7773798521256932,0.8359519408502772,0.8922134935304991,0.588724584103512,0.24563693776956252,0.3842036352433764,0.46946819162045594,0.28454251386321633,0.5729821318545902,0.17444547134935307,0.6952402957486137,0.6944785523574759,0.5997892633564369,0.5141934801734918
|
6 |
+
1,10000,0.6245378927911276,0.7996765249537893,0.8536275415896488,0.9030730129390019,0.6245378927911276,0.26111560382008625,0.4029574861367837,0.4934881392483056,0.2970194085027727,0.5984249845964263,0.18128465804066543,0.7218153111521873,0.7225511985447247,0.627729195650237,0.5423889030899157,0.6177218114602587,0.7975970425138632,0.8543207024029574,0.902148798521257,0.6177218114602587,0.2581465650030807,0.40010782501540354,0.4898413431916204,0.295956561922366,0.5960085489833642,0.18048752310536045,0.7189406192236598,0.7179194906551606,0.623431564603359,0.5376323396475766
|
7 |
+
1,-1,0.6359750462107209,0.8123844731977818,0.8637939001848429,0.9061922365988909,0.6359750462107209,0.2657655576093654,0.41620455945779417,0.5085355052372149,0.30646950092421443,0.6165164818237832,0.18518946395563773,0.735805606900801,0.7328194682833048,0.6415828278128572,0.556254749229156,0.6208410351201479,0.8059149722735675,0.8554759704251387,0.9053835489833642,0.6208410351201479,0.25933264017252006,0.40854128157732594,0.4993357208872458,0.30078558225508323,0.6046518792359827,0.18341035120147878,0.728752695625385,0.72204741916791,0.6315733383267215,0.5453192417315094
|
8 |
+
2,5000,0.6336645101663586,0.8182763401109058,0.8692236598890942,0.9134704251386322,0.6336645101663586,0.26413277880468267,0.4207486136783734,0.5137438385705483,0.3108133086876156,0.6237811922365989,0.18924445471349358,0.7509453943314848,0.7347628399788712,0.6494148196481458,0.561944906639205,0.6258086876155268,0.8157347504621072,0.8678373382624769,0.9125462107208873,0.6258086876155268,0.26099622612446083,0.4159735058533579,0.5084411583487369,0.3090804066543439,0.6204270640788663,0.18811229205175603,0.7464186691312383,0.7296023219053442,0.6441727743936881,0.5565738973513467
|
9 |
+
2,10000,0.6538817005545287,0.8272874306839186,0.8785813308687616,0.9152033271719039,0.6538817005545287,0.2728762322858903,0.4335335797905114,0.5289837492298213,0.3192929759704252,0.6400820240295749,0.1918322550831793,0.760435921133703,0.7492817625355696,0.664130252525087,0.5790998282458079,0.6446395563770795,0.8282116451016636,0.8740757855822551,0.9152033271719039,0.6446395563770795,0.26896565003080714,0.4313385705483672,0.5260435921133704,0.31545748613678376,0.6326247689463954,0.1906192236598891,0.7555895717806531,0.7435877123492598,0.6580660232413675,0.5724307278317984
|
10 |
+
2,-1,0.654228280961183,0.8398798521256932,0.8866682070240296,0.922481515711645,0.654228280961183,0.27288971041281573,0.44304528650646946,0.5394312230437461,0.3271719038817006,0.654819393099199,0.19572550831792979,0.7747804990757856,0.753795009608892,0.6741412630873878,0.5880719173615376,0.6470656192236599,0.836760628465804,0.8823937153419593,0.922481515711645,0.6470656192236599,0.2692544670363524,0.43923290203327164,0.5340919593345655,0.3236367837338263,0.6474468576709796,0.19495147874306842,0.7717074861367837,0.7487518520963491,0.6691584332042485,0.5823762028575432
|
11 |
+
3,5000,0.6657809611829945,0.8455406654343808,0.8893253234750462,0.926409426987061,0.6657809611829945,0.27798251694393095,0.45174830560690077,0.5503562076401725,0.332786506469501,0.6666262322858904,0.19860212569316085,0.7858229359211337,0.7623335863920392,0.6855246939489122,0.6006410545580767,0.6648567467652495,0.845771719038817,0.8887476894639557,0.9266404805914972,0.6648567467652495,0.27726625077017863,0.44889864448552064,0.5467999075785582,0.3310998151571165,0.6632567005545287,0.19792051756007395,0.7828057609365371,0.7611623646685988,0.6828136528573054,0.5977714085433228
|
12 |
+
3,10000,0.6739833641404805,0.8518946395563771,0.8946395563770795,0.929413123844732,0.6739833641404805,0.28091882316697475,0.4591420209488601,0.5588493530499077,0.3385859519408503,0.6774029574861368,0.2017906654343808,0.796901956253851,0.7693513353431284,0.6954747718926209,0.6101676843125554,0.6682070240295749,0.8486598890942699,0.893137707948244,0.9280268022181146,0.6682070240295749,0.2784022643253234,0.4542898952556993,0.5532386013555144,0.3366451016635861,0.6737811922365989,0.20101663585951943,0.7938809303758473,0.7646476065780546,0.6914392038031338,0.6060166331150262
|
13 |
+
3,-1,0.6732902033271719,0.8501617375231053,0.8940619223659889,0.9292975970425139,0.6732902033271719,0.2804451632778805,0.45945009242144175,0.559259473197782,0.3390942698706101,0.6781808379544055,0.20213724584103515,0.7977491528034503,0.7687348256462124,0.6960990291172859,0.6113913618950404,0.6721349353049908,0.8470425138632163,0.8917513863216266,0.9280268022181146,0.6721349353049908,0.2798059149722736,0.45606130622304375,0.5546480283425755,0.33719963031423295,0.6745224892174984,0.20177911275415897,0.7963204713493531,0.7665948291670297,0.6937364446255225,0.6085322966266078
|
14 |
+
4,5000,0.6819547134935305,0.8545517560073937,0.8970656192236599,0.9298752310536045,0.6819547134935305,0.28420556069008013,0.46526494146642017,0.5658887861983981,0.3425138632162662,0.6840938847812692,0.20358133086876157,0.8030826401725201,0.7745416519965921,0.702804990544695,0.6190669169533174,0.6768715341959335,0.8544362292051756,0.8963724584103512,0.9291820702402958,0.6768715341959335,0.2821241528034504,0.46364756623536657,0.5633664510166357,0.3413354898336414,0.6812942852741836,0.20335027726432536,0.8022912815773259,0.7712486796936837,0.7004325983157041,0.6159636703438823
|
15 |
+
4,10000,0.6909658040665434,0.8604436229205176,0.9003003696857671,0.9309149722735675,0.6909658040665434,0.28818930991990144,0.4745455945779421,0.5769793592113371,0.34792051756007397,0.6945721657424522,0.20557994454713496,0.8104705791743684,0.7815427595868869,0.7120336678051099,0.6295564578901608,0.6872689463955638,0.8603280961182994,0.9005314232902033,0.93137707948244,0.6872689463955638,0.2863697627849661,0.4709257547751078,0.5723890942698706,0.3472966728280962,0.6930106284658041,0.20532578558225512,0.809326863832409,0.7789486785934286,0.7090074770240424,0.6253329377410477
|
16 |
+
4,-1,0.6886552680221811,0.859865988909427,0.9007624768946395,0.9310304990757856,0.6886552680221811,0.2872265865680838,0.47439155884165124,0.5771449476278496,0.3477587800369686,0.6944855206407887,0.20577634011090573,0.8112022489217499,0.7802448251327598,0.7117177780609651,0.6290004410942484,0.6865757855822551,0.8600970425138632,0.9007624768946395,0.9312615526802218,0.6865757855822551,0.2860982747997535,0.47161891558841645,0.5734654189772026,0.34725046210720895,0.6931665896487985,0.20542975970425142,0.8097889710412816,0.7785674401461083,0.7091436210108051,0.6254649050409521
|
modules.json
ADDED
@@ -0,0 +1,14 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
[
|
2 |
+
{
|
3 |
+
"idx": 0,
|
4 |
+
"name": "0",
|
5 |
+
"path": "",
|
6 |
+
"type": "sentence_transformers.models.Transformer"
|
7 |
+
},
|
8 |
+
{
|
9 |
+
"idx": 1,
|
10 |
+
"name": "1",
|
11 |
+
"path": "1_Pooling",
|
12 |
+
"type": "sentence_transformers.models.Pooling"
|
13 |
+
}
|
14 |
+
]
|
pytorch_model.bin
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:727c7cabb5a322a7ed6454b28d5659f1611c03765156b397c585590ae6ac4953
|
3 |
+
size 438000173
|
sentence_bert_config.json
ADDED
@@ -0,0 +1,4 @@
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"max_seq_length": 350,
|
3 |
+
"do_lower_case": false
|
4 |
+
}
|
special_tokens_map.json
ADDED
@@ -0,0 +1,7 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"cls_token": "[CLS]",
|
3 |
+
"mask_token": "[MASK]",
|
4 |
+
"pad_token": "[PAD]",
|
5 |
+
"sep_token": "[SEP]",
|
6 |
+
"unk_token": "[UNK]"
|
7 |
+
}
|
tokenizer.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|
tokenizer_config.json
ADDED
@@ -0,0 +1,15 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"clean_up_tokenization_spaces": true,
|
3 |
+
"cls_token": "[CLS]",
|
4 |
+
"do_basic_tokenize": true,
|
5 |
+
"do_lower_case": true,
|
6 |
+
"mask_token": "[MASK]",
|
7 |
+
"model_max_length": 512,
|
8 |
+
"never_split": null,
|
9 |
+
"pad_token": "[PAD]",
|
10 |
+
"sep_token": "[SEP]",
|
11 |
+
"strip_accents": null,
|
12 |
+
"tokenize_chinese_chars": true,
|
13 |
+
"tokenizer_class": "BertTokenizer",
|
14 |
+
"unk_token": "[UNK]"
|
15 |
+
}
|
vocab.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|