Jinhwan commited on
Commit
9762b4b
1 Parent(s): a9793bd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -1
README.md CHANGED
@@ -8,4 +8,24 @@ tags:
8
  # KrELECTRA-base-mecab
9
  Korean-based Pre-trained ELECTRA Language Model using Mecab (Morphological Analyzer)
10
 
11
- For more detail, please see [original repository](https://github.com/monologg/KoELECTRA/blob/master/README_EN.md).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  # KrELECTRA-base-mecab
9
  Korean-based Pre-trained ELECTRA Language Model using Mecab (Morphological Analyzer)
10
 
11
+ For more detail, please see [original repository](https://github.com/monologg/KoELECTRA/blob/master/README_EN.md).
12
+
13
+ ## Usage
14
+
15
+ ### Load model and tokenizer
16
+
17
+ ```python
18
+ >>> from transformers import AutoTokenizer, AutoModelForPreTraining
19
+ >>> model = AutoModelForPreTraining.from_pretrained("Jinhwan/krelectra-base-mecab")
20
+ >>> tokenizer = AutoTokenizer.from_pretrained("Jinhwan/krelectra-base-mecab")
21
+ ```
22
+
23
+ ### Tokenizer example
24
+
25
+ ```python
26
+ >>> from transformers import AutoTokenizer
27
+ >>> tokenizer = AutoTokenizer.from_pretrained("Jinhwan/krelectra-base-mecab")
28
+ >>> tokenizer.tokenize("[CLS] 한국어 ELECTRA를 공유합니다. [SEP]")
29
+ ['[CLS]', '한국어', 'EL', '##ECT', '##RA', '##를', '공유', '##합', '##니다', '.', '[SEP]']
30
+ >>> tokenizer.convert_tokens_to_ids(['[CLS]', '한국어', 'EL', '##ECT', '##RA', '##를', '공유', '##합', '##니다', '.', '[SEP]'])
31
+ [2, 7214, 24023, 24663, 26580, 3195, 7086, 3746, 5500, 17, 3]