kaisugi commited on
Commit
da40370
1 Parent(s): 976f5d4

add tips in README

Browse files
Files changed (1) hide show
  1. README.md +26 -1
README.md CHANGED
@@ -28,9 +28,34 @@ This model is released under the [Creative Commons 4.0 International License](ht
28
  # download Manbyo-Dictionary
29
 
30
  mkdir -p /usr/local/lib/mecab/dic/userdic
31
- wget https://sociocom.jp/~data/2018-manbyo/data/MANBYO_201907_Dic-utf8.dic && mv MANBYO_201907_Dic-utf8.dic /usr/local/lib/mecab/dic/userdic
 
32
  ```
33
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
34
  **Input text must be converted to full-width characters(全角)in advance.**
35
 
36
  You can use this model for masked language modeling as follows:
 
28
  # download Manbyo-Dictionary
29
 
30
  mkdir -p /usr/local/lib/mecab/dic/userdic
31
+ wget https://sociocom.jp/~data/2018-manbyo/data/MANBYO_201907_Dic-utf8.dic
32
+ mv MANBYO_201907_Dic-utf8.dic /usr/local/lib/mecab/dic/userdic
33
  ```
34
 
35
+ ---
36
+
37
+ **Note: If you don't have root privileges and find it difficult to download the Manbyo Dictionary to `/usr/local/lib/mecab/dic/userdic`, you can still load our model by overriding tokenizer settings as follows:**
38
+
39
+ ```bash
40
+ # download Manbyo-Dictionary wherever you like
41
+
42
+ wget https://sociocom.jp/~data/2018-manbyo/data/MANBYO_201907_Dic-utf8.dic
43
+ mv MANBYO_201907_Dic-utf8.dic /anywhere/you/like
44
+ ```
45
+
46
+ ```python
47
+ from transformers import AutoModelForMaskedLM, AutoTokenizer
48
+
49
+ model = AutoModelForMaskedLM.from_pretrained("alabnii/jmedroberta-base-manbyo-wordpiece-vocab50000")
50
+ tokenizer = AutoTokenizer.from_pretrained("alabnii/jmedroberta-base-manbyo-wordpiece-vocab50000", **{
51
+ "mecab_kwargs": {
52
+ "mecab_option": "-u /anywhere/you/like/MANBYO_201907_Dic-utf8.dic"
53
+ }
54
+ })
55
+ ```
56
+
57
+ ---
58
+
59
  **Input text must be converted to full-width characters(全角)in advance.**
60
 
61
  You can use this model for masked language modeling as follows: