# bert-base-sudachitra-v11 This model is a variant of SudachiTra. The differences between the original `chiTra v1.1` and `bert-base-sudachitra-v11` are: - `word_form_type` was changed from `normalized_nouns` to `surface` *(See [GitHub - WorksApplications/SudachiTra](https://github.com/WorksApplications/SudachiTra) for the latest README)* # Sudachi Transformers (chiTra) chiTra provides the pre-trained language models and a Japanese tokenizer for [Transformers](https://github.com/huggingface/transformers). ## chiTra pretrained language model We used [NINJAL Web Japanese Corpus (NWJC)](https://pj.ninjal.ac.jp/corpus_center/nwjc/) from National Institute for Japanese Language and Linguistics which contains around 100 million web page text. NWJC was used after cleaning to remove unnecessary sentences. This model trained BERT using a pre-learning script implemented by [NVIDIA](https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow2/LanguageModeling/BERT). ## License Copyright (c) 2022 National Institute for Japanese Language and Linguistics and Works Applications Co., Ltd. All rights reserved. "chiTra" is distributed by [National Institute for Japanese Langauge and Linguistics](https://www.ninjal.ac.jp/) and [Works Applications Co.,Ltd.](https://www.worksap.co.jp/) under [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0).