ctoraman commited on
Commit
a796138
·
1 Parent(s): da3ff5e

readme updated

Browse files
Files changed (1) hide show
  1. README.md +13 -0
README.md CHANGED
@@ -18,6 +18,19 @@ Model architecture is similar to bert-medium (8 layers, 8 heads, and 512 hidden
18
  The details can be found at this paper:
19
  https://arxiv.org/...
20
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
  ### BibTeX entry and citation info
22
  ```bibtex
23
  @article{}
 
18
  The details can be found at this paper:
19
  https://arxiv.org/...
20
 
21
+ The following code segment can be used for initializing the tokenizer, example max length (514) can be changed:
22
+ ```
23
+ tokenizer = PreTrainedTokenizerFast(tokenizer_file=[file_path])
24
+ tokenizer.mask_token = "[MASK]"
25
+ tokenizer.cls_token = "[CLS]"
26
+ tokenizer.sep_token = "[SEP]"
27
+ tokenizer.pad_token = "[PAD]"
28
+ tokenizer.unk_token = "[UNK]"
29
+ tokenizer.bos_token = "[CLS]"
30
+ tokenizer.eos_token = "[SEP]"
31
+ tokenizer.model_max_length = 514
32
+ ```
33
+
34
  ### BibTeX entry and citation info
35
  ```bibtex
36
  @article{}