wissamantoun commited on
Commit
590a87e
1 Parent(s): c655777

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -46,9 +46,9 @@ All models are available in the `HuggingFace` model page under the [aubmindlab](
46
 
47
  We identified an issue with AraBERTv1's wordpiece vocabulary. The issue came from punctuations and numbers that were still attached to words when learned the wordpiece vocab. We now insert a space between numbers and characters and around punctuation characters.
48
 
49
- The new vocabulary was learnt using the `BertWordpieceTokenizer` from the `tokenizers` library, and should now support the Fast tokenizer implementation from the `transformers` library.
50
 
51
- **P.S.**: All the old BERT codes should work with the new BERT, just change the model name and check the new preprocessing dunction
52
  **Please read the section on how to use the [preprocessing function](#Preprocessing)**
53
 
54
  ## Bigger Dataset and More Compute
@@ -86,7 +86,7 @@ It is recommended to apply our preprocessing function before training/testing on
86
  ```python
87
  from arabert.preprocess import ArabertPreprocessor
88
 
89
- model_name="bert-base-arabertv02"
90
  arabert_prep = ArabertPreprocessor(model_name=model_name)
91
 
92
  text = "ولن نبالغ إذا قلنا: إن هاتف أو كمبيوتر المكتب في زمننا هذا ضروري"
46
 
47
  We identified an issue with AraBERTv1's wordpiece vocabulary. The issue came from punctuations and numbers that were still attached to words when learned the wordpiece vocab. We now insert a space between numbers and characters and around punctuation characters.
48
 
49
+ The new vocabulary was learned using the `BertWordpieceTokenizer` from the `tokenizers` library, and should now support the Fast tokenizer implementation from the `transformers` library.
50
 
51
+ **P.S.**: All the old BERT codes should work with the new BERT, just change the model name and check the new preprocessing function
52
  **Please read the section on how to use the [preprocessing function](#Preprocessing)**
53
 
54
  ## Bigger Dataset and More Compute
86
  ```python
87
  from arabert.preprocess import ArabertPreprocessor
88
 
89
+ model_name="aubmindlab/bert-large-arabertv02"
90
  arabert_prep = ArabertPreprocessor(model_name=model_name)
91
 
92
  text = "ولن نبالغ إذا قلنا: إن هاتف أو كمبيوتر المكتب في زمننا هذا ضروري"