ctoraman commited on
Commit
9148154
1 Parent(s): cf983c2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -8,12 +8,12 @@ datasets:
8
  - oscar
9
  ---
10
 
11
- # RoBERTa Turkish medium Character-level 16k (uncased)
12
 
13
  Pretrained model on Turkish language using a masked language modeling (MLM) objective. The model is uncased.
14
  The pretrained corpus is OSCAR's Turkish split, but it is further filtered and cleaned.
15
 
16
- Model architecture is similar to bert-medium (8 layers, 8 heads, and 512 hidden size). Tokenization algorithm is Character-level, which means that text is split by individual characters. Vocabulary size is 16.7k.
17
 
18
  ## Note that this model does not include a tokenizer file, because it uses ByT5Tokenizer. The following code can be used for model loading and tokenization, example max length(1024) can be changed:
19
  ```
 
8
  - oscar
9
  ---
10
 
11
+ # RoBERTa Turkish medium Character-level (uncased)
12
 
13
  Pretrained model on Turkish language using a masked language modeling (MLM) objective. The model is uncased.
14
  The pretrained corpus is OSCAR's Turkish split, but it is further filtered and cleaned.
15
 
16
+ Model architecture is similar to bert-medium (8 layers, 8 heads, and 512 hidden size). Tokenization algorithm is Character-level, which means that text is split by individual characters. Vocabulary size is 384.
17
 
18
  ## Note that this model does not include a tokenizer file, because it uses ByT5Tokenizer. The following code can be used for model loading and tokenization, example max length(1024) can be changed:
19
  ```