gchhablani commited on
Commit
20eea66
β€’
1 Parent(s): 0e74c3e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -60,7 +60,7 @@ We used 99% of the 10M examples as a train set, and the remaining ~ 100K example
60
 
61
  ## Training procedure πŸ‘¨πŸ»β€πŸ’»
62
  ### Preprocessing
63
- The texts are lowercased and tokenized using WordPiece and a shared vocabulary size of approximately 110,000. The beginning of a new document is marked with `[CLS]` and the end of one by `[CLS]`
64
  The details of the masking procedure for each sentence are the following:
65
  - 15% of the tokens are masked.
66
  - In 80% of the cases, the masked tokens are replaced by `[MASK]`.
 
60
 
61
  ## Training procedure πŸ‘¨πŸ»β€πŸ’»
62
  ### Preprocessing
63
+ The texts are lowercased and tokenized using WordPiece and a shared vocabulary size of approximately 110,000. The beginning of a new document is marked with `[CLS]` and the end of one by `[SEP]`
64
  The details of the masking procedure for each sentence are the following:
65
  - 15% of the tokens are masked.
66
  - In 80% of the cases, the masked tokens are replaced by `[MASK]`.