conan1024hao commited on
Commit
3d66092
1 Parent(s): c799aa4

update readme

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -32,7 +32,7 @@ You can fine-tune this model on downstream tasks.
32
 
33
  ## Tokenization
34
 
35
- `BertJapaneseTokenizer` now supports automatic tokenization for [Juman++](https://github.com/ku-nlp/jumanpp). However, if your dataset is large, you may take a long time since `BertJapaneseTokenizer` still does not supoort fast tokenization. You can still do the Juman++ tokenization by your self and use the old model [nlp-waseda/roberta-large-japanese](https://huggingface.co/nlp-waseda/roberta-large-japanese).
36
 
37
  Juman++ 2.0.0-rc3 was used for pretraining. Each word is tokenized into tokens by [sentencepiece](https://github.com/google/sentencepiece).
38
 
 
32
 
33
  ## Tokenization
34
 
35
+ `BertJapaneseTokenizer` now supports automatic tokenization for [Juman++](https://github.com/ku-nlp/jumanpp). However, if your dataset is large, you may take a long time since `BertJapaneseTokenizer` still does not supoort fast tokenization. You can still do the Juman++ tokenization by your self and use the old model [nlp-waseda/roberta-large-japanese-seq512](https://huggingface.co/nlp-waseda/roberta-large-japanese-seq512).
36
 
37
  Juman++ 2.0.0-rc3 was used for pretraining. Each word is tokenized into tokens by [sentencepiece](https://github.com/google/sentencepiece).
38