readme updated
Browse files
README.md
CHANGED
@@ -18,6 +18,19 @@ Model architecture is similar to bert-medium (8 layers, 8 heads, and 512 hidden
|
|
18 |
The details can be found at this paper:
|
19 |
https://arxiv.org/...
|
20 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
21 |
### BibTeX entry and citation info
|
22 |
```bibtex
|
23 |
@article{}
|
|
|
18 |
The details can be found at this paper:
|
19 |
https://arxiv.org/...
|
20 |
|
21 |
+
The following code segment can be used for initializing the tokenizer, example max length (514) can be changed:
|
22 |
+
```
|
23 |
+
tokenizer = PreTrainedTokenizerFast(tokenizer_file=[file_path])
|
24 |
+
tokenizer.mask_token = "[MASK]"
|
25 |
+
tokenizer.cls_token = "[CLS]"
|
26 |
+
tokenizer.sep_token = "[SEP]"
|
27 |
+
tokenizer.pad_token = "[PAD]"
|
28 |
+
tokenizer.unk_token = "[UNK]"
|
29 |
+
tokenizer.bos_token = "[CLS]"
|
30 |
+
tokenizer.eos_token = "[SEP]"
|
31 |
+
tokenizer.model_max_length = 514
|
32 |
+
```
|
33 |
+
|
34 |
### BibTeX entry and citation info
|
35 |
```bibtex
|
36 |
@article{}
|